Cambridge studies in advanced mathematics las 


Additive 
Combinatorics 


TERENGE TAG 
A VAIN H. Vu 





This page intentionally left blank 


CAMBRIDGE STUDIES IN 
ADVANCED MATHEMATICS 105 


EDITORIAL BOARD 
B. BOLLOBAS, W. FULTON, A. KATOK, F. KIRWAN, 
P. SARNAK, B. SIMON, B. TOTARO 


ADDITIVE COMBINATORICS 


Additive combinatorics is the theory of counting additive structures in sets. This theory 
has seen exciting developments and dramatic changes in direction in recent years, thanks 
to its connections with areas such as number theory, ergodic theory and graph theory. This 
graduate level textbook will allow students and researchers easy entry into this fascinating 
field. Here, for the first time, the authors bring together, in a self-contained and systematic 
manner, the many different tools and ideas that are used in the modern theory, presenting 
them in an accessible, coherent, and intuitively clear manner, and providing immediate 
applications to problems in additive combinatorics. The power of these tools is well 
demonstrated in the presentation of recent advances such as the Green-Tao theorem on 
arithmetic progressions and Erdős distance problems, and the developing field of 
sum-product estimates. The text is supplemented by a large number of exercises and new 
material. 


TERENCE TAO isa professor in the Department of Mathematics at the University of 
California, Los Angeles. 


VAN VU isa professor in the Department of Mathematics at Rutgers University, 
New Jersey. 


CAMBRIDGE STUDIES IN ADVANCED MATHEMATICS 

Editorial Board: 

B. Bollobas, W. Fulton, A. Katok, F. Kirwan, P. Sarnak, B. Simon, B. Totaro 

Au the title, listed below can be obtained from good booksellers or from Cambridge University Press 


for a complete listing visit www.cambridge.org/uk/series/&Series.asp?code=CSAM. 


49 R. Stanley Enumerative combinatorics I 

50 I. Porteous Clifford algebras and the classical groups 

51 M. Audin Spinning tops 

52 V. Jurdjevic Geometric control theory 

53 H. Volklein Groups as Galois groups 

54 J. Le Potier Lectures on vector bundles 

55 D. Bump Automorphic forms and representations 

56 G. Laumon Cohomology of Drinfeld modular varieties IT 

57 D.M. Clark & B.A. Davey Natural dualities for the working algebraist 
58 J. McCleary A user’s guide to spectral sequences II 

59 P. Taylor Practical foundations of mathematics 

60 M.P. Brodmann & R.Y. Sharp Local cohomology 

61 J.D. Dixon et al. Analytic pro-P groups 

62 R. Stanley Enumerative combinatorics IT 

63 R.M. Dudley Uniform central limit theorems 

64 J. Jost & X. Li-Jost Calculus of variations 

65 A.J. Berrick & M.E. Keating An introduction to rings and modules 

66 S. Morosawa Holomorphic dynamics 

67 A.J. Berrick & M.E. Keating Categories and modules with K-theory in view 
68 K. Sato Levy processes and infinitely divisible distributions 

69 H. Hida Modular forms and Galois cohomology 

70 R. Iorio & V. Iorio Fourier analysis and partial differential equations 
71 R. Blei Analysis in integer and fractional dimensions 

72 F Borceaux & G. Janelidze Galois theories 

73 B. Bollobas Random graphs 

74 R.M. Dudley Real analysis and probability 

75 T. Sheil-Small Complex polynomials 

76 C. Voisin Hodge theory and complex algebraic geometry I 

77 C. Voisin Hodge theory and complex algebraic geometry II 

78 V. Paulsen Completely bounded maps and operator algebras 

79 F. Gesztesy & H. Holden Soliton Equations and their Algebro-Geometric Solutions Volume 1 
81 Shigeru Mukai An Introduction to Invariants and Moduli 

82 G. Tourlakis Lectures in logic and set theory I 

83 G. Tourlakis Lectures in logic and set theory IT 

84 R.A. Bailey Association Schemes 

85 James Carlson, Stefan Miiller-Stach, & Chris Peters Period Mappings and Period Domains 
86 J.J. Duistermaat & J.A.C. Kolk Multidimensional Real Analysis I 

87 J.J. Duistermaat & J.A.C. Kolk Multidimensional Real Analysis IT 

89 M. Golumbic & A.N. Trenk Tolerance Graphs 

90 L.H. Harper Global Methods for Combinatorial Isoperimetric Problems 
91 I. Moerdijk & J. Mrcun Introduction to Foliations and Lie Groupoids 
92 Janos Kollar, Karen E. Smith, & Alessio Corti Rational and Nearly Rational Varieties 
93 David Applebaum Lévy Processes and Stochastic Calculus 

95 Martin Schechter An Introduction to Nonlinear Analysis 


Additive Combinatorics 


TERENCE TAO, VAN VU 






] CAMBRIDGE 
J UNIVERSITY PRESS 


cambridge university press 
Cambridge, New York, Melbourne, Madrid, Cape Town, Singapore, Sao Paulo 


Cambridge University Press 
The Edinburgh Building, Cambridge cb2 2ru, UK 
Published in the United States of America by Cambridge University Press, New York 


www.cambridge.org 
Information on this title: www.cambridge.org/978052 1853866 


© Cambridge University Press 2006 


This publication is in copyright. Subject to statutory exception and to the provision of 
relevant collective licensing agreements, no reproduction of any part may take place 
without the written permission of Cambridge University Press. 


First published in print format 2006 


isbn-13 978-0-511-24530-5 eBook (EBL) 
isbn-10 0-511-24530-0 eBook (EBL) 


isbn-13  978-0-521-85386-6 hardback 
isbn-10 0-521-85386-9 hardback 


Cambridge University Press has no responsibility for the persistence or accuracy of urls 
for external or third-party internet websites referred to in this publication, and does not 
guarantee that any content on such websites is, or will remain, accurate or appropriate. 


To our families 


Contents 





1.1 
1.2 
1.3 
1.4 
1.5 
1.6 
1.7 
1.8 
1.9 
1.10 


2.1 
2.2 
2:3 
2.4 
2.5 
2.6 
2.7 
2.8 


3.1 
3.2 
3.3 


Prologue 


The probabilistic method 

The first moment method 

The second moment method 

The exponential moment method 
Correlation inequalities 

The Lovász local lemma 
Janson’s inequality 
Concentration of polynomials 
Thin bases of higher order 

Thin Waring bases 

Appendix: the distribution of the primes 


Sum set estimates 

Sum sets 

Doubling constants 

Ruzsa distance and additive energy 

Covering lemmas 

The Balog—Szemerédi—Gowers theorem 
Symmetry sets and imbalanced partial sum sets 
Non-commutative analogs 

Elementary sum-product estimates 


Additive geometry 
Additive groups 
Progressions 
Convex bodies 


Vil 


page Xi 


ODN Ke 


23 
27 
33 
37 
42 
45 


51 
54 
57 
59 
69 
78 
83 
92 
99 


112 
113 
119 
122 


Vili 


3.4 
3.5 
3.6 


4.1 
4.2 
4.3 
4.4 
4.5 
4.6 
4.7 


5.1 
5.2 
5.3 
5.4 
5.5 
5.6 


6.1 
6.2 
6.3 
6.4 
6.5 


7.1 
7.2 
73 
74 
75 
7.6 


8.1 
8.2 
8.3 
8.4 
8.5 


Contents 


The Brunn—Minkowski inequality 
Intersecting a convex set with a lattice 
Progressions and proper progressions 


Fourier-analytic methods 

Basic theory 

L? theory 

Linear bias 

Bohr sets 

A(p) constants, B,[g] sets, and dissociated sets 
The spectrum of an additive set 

Progressions in sum sets 


Inverse sum set theorems 

Minimal size of sum sets and the e-transform 
Sum sets in vector spaces 

Freiman homomorphisms 

Torsion and torsion-free inverse theorems 
Universal ambient groups 

Freiman’s theorem in an arbitrary group 


Graph-theoretic methods 

Basic Notions 

Independent sets, sum-free subsets, and Sidon sets 
Ramsey theory 

Proof of the Balog—Szemerédi—Gowers theorem 
Pliinnecke’s theorem 


The Littlewood—Offord problem 

The combinatorial approach 

The Fourier-analytic approach 

The Esséen concentration inequality 
Inverse Littlewood—Offord results 
Random Bernoulli matrices 

The quadratic Littlkewood—Offord problem 


Incidence geometry 

The crossing number of a graph 

The Szemerédi—Trotter theorem 

The sum-product problem in R 

Cell decompositions and the distinct distances problem 
The sum-product problem in other fields 


127 
130 
143 


149 
150 
156 
160 
165 
172 
181 
189 


198 
198 
211 
220 
227 
233 
239 


246 
247 
248 
254 
261 
267 


276 
2I 
281 
290 
292 
297 
304 


308 
308 
311 
315 
319 
325 


9.1 
9.2 
9.3 
9.4 
9.5 
9.6 
9.7 
9.8 


10 

10.1 
10.2 
10.3 
10.4 
10.5 
10.6 
10.7 


11 

11.1 
11.2 
11.3 
11.4 
11.5 
11.6 
11.7 


12 

12.1 
12.2 
12.3 
12.4 
12.5 
12.6 


Contents 


Algebraic methods 

The combinatorial Nullstellensatz 

Restricted sum sets 

Snevily’s conjecture 

Finite fields 

Davenport’s problem 

Kemnitz’s conjecture 

Stepanov’s method 

Cyclotomic fields, and the uncertainty principle 


Szemerédi’s theorem for k = 3 
General strategy 

The small torsion case 

The integer case 

Quantitative bounds 

An ergodic argument 

The Szemerédi regularity lemma 
Szemerédi’s argument 


Szemerédi’s theorem for k > 3 
Gowers uniformity norms 

Hard obstructions to uniformity 
Proof of Theorem 11.6 

Soft obstructions to uniformity 

The infinitary ergodic approach 

The hypergraph approach 

Arithmetic progressions in the primes 


Long arithmetic progressions in sum sets 
Introduction 

Proof of Theorem 12.4 

Generalizations and variants 

Complete and subcomplete sequences 
Proof of Theorem 12.17 

Further applications 


Bibliography 
Index 


ix 


329 
330 


342 
345 
350 
354 
356 
362 


369 
372 


386 
389 
398 
406 
411 


414 
417 
424 
432 
440 
448 
454 
463 


470 
470 
473 
477 
480 
482 
484 


488 
505 


Prologue 





This book arose out of lecture notes developed by us while teaching courses on 
additive combinatorics at the University of California, Los Angeles and the Uni- 
versity of California, San Diego. Additive combinatorics is currently a highly 
active area of research for several reasons, for example its many applications to 
additive number theory. One remarkable feature of the field is the use of tools 
from many diverse fields of mathematics, including elementary combinatorics, 
harmonic analysis, convex geometry, incidence geometry, graph theory, proba- 
bility, algebraic geometry, and ergodic theory; this wealth of perspectives makes 
additive combinatorics a rich, fascinating, and multi-faceted subject. There are still 
many major problems left in the field, and it seems likely that many of these will 
require a combination of tools from several of the areas mentioned above in order 
to solve them. 

The main purpose of this book is to gather all these diverse tools in one location, 
present them in a self-contained and introductory manner, and illustrate their appli- 
cation to problems in additive combinatorics. Many aspects of this material have 
already been covered in other papers and texts (and in particular several earlier 
books [168], [257], [116] have focused on some of the aspects of additive combi- 
natorics), but this book attempts to present as many perspectives and techniques 
as possible in a unified setting. 

Additive combinatorics is largely concerned with the additive structure! of sets. 
To clarify what we mean by “additive structure’, let us introduce the following 
definitions. 


Definition 0.1 An additive group is any abelian group Z with group operation +. 
Note that we can define a multiplication operation nx € Z whenever n € Z and 


1 We will also occasionally consider the multiplicative structure of sets as well; we will refer to the 
combined study of such structures as arithmetic combinatorics. 


xi 


xii Prologue 


x € Z inthe usual manner: thus 3x = x + x + x, —2x = —x — x, etc. An additive 
setis a pair (A, Z), where Z is an additive group, and A is a finite non-empty subset 
of Z. We often abbreviate an additive set (A, Z) simply as A, and refer to Z as the 
ambient group of the additive set. If A, B are additive sets in Z, we define the sum 
set 


A+B:={a+b:aeA, be B} 
and difference set 
A-B:={a—b:aeéA, be B}. 
Also, we define the iterated sumset kA for k € Z* by 
kA := {a +- --+ ak: a),...,a,% E A}. 


We caution that the sumset kA is usually distinct from the dilation k - A of A, 
defined by 


k- A := {ka :a € A}. 


For us, typical examples of additive groups Z will be the integers Z, a cyclic 
group Zy, a Euclidean space R”, or a finite field geometry F,. As the notation 
suggests, we will eventually be viewing additive sets as “intrinsic” objects, which 
can be embedded inside any number of different ambient groups; this is some- 
what similar to how a manifold can be thought of intrinsically, or alternatively 
can be embedded into an ambient space. To make these ideas rigorous we will 
need to develop the theory of Freiman homomorphisms, but we will defer this to 
Section 5.3. 

Additive sets may have a large or small amount of additive structure. A good 
example of a set with little additive structure would be a randomly chosen subset 
A of a finite additive group Z with some fixed cardinality. At the other extreme, 
examples of sets with very strong additive structure would include arithmetic 
progressions 


a + [0, N) -r := {aa +r,...,a + (N — Dr} 
wherea,r € Zand N € Z*;ord-dimensional generalized arithmetic progressions 
a + [0, N); v := {a + nivi +-+ + nava: 0 <n; < N; foralll < j <d} 


where a € Z, v = (v1, ..., va) € Zf, and N = (Ni, ..., Na) € (Z*)4; or d- 
dimensional cubes 


a + {0, 1}? - v = {a + e101 ++ + eava £ €i, -€a € {0, 1}}; 
or the subset sums F S(A) := {aega : B C A} of a finite set A. 


Prologue xiii 


A fundamental task in this subject is to give some quantitative measures of 
additive structure in a set, and then investigate to what extent these measures are 
equivalent to each other. For example, one could try to quantify each of the fol- 
lowing informal statements as being some version of the assertion “A has additive 
structure”: 


e A + Ais small; 

e A — A is small; 

e A — A can be covered by a small number of translates of A; 

e kA is small for any fixed k; 

e there are many quadruples (a1, a2, a3, a4) € A x A x A x A such that 
a, + az = a3 + 44; 

e there are many quadruples (a1, a2, a3, a4) € A x A x A x A such that 
ai — a2 = a3 — a4, 

e the convolution 14 * 14 is highly concentrated; 

° the subset sums F S(A) := {acg 4: B C A} have high multiplicity; 

e the Fourier transform I is highly concentrated; 

e the Fourier transform P is highly concentrated in a cube; 

e A has a large intersection with a generalized arithmetic progression, of size 
comparable to A; 

e A is contained in a generalized arithmetic progression, of size comparable to A; 

e A (or perhaps A — A, or 2A — 2A) contains a large generalized arithmetic 
progression. 


The reader is invited to investigate to what extent these informal statements are 
true for sets such as progressions and cubes, and false for sets such as random sets. 
As it turns out, once one makes the above assertions more quantitative, there are 
a number of deep and important equivalences between them; indeed, to oversim- 
plify tremendously, all of the above criteria for additive structure are “essentially” 
equivalent. There is also a similar heuristic to quantify what it would mean for two 
additive sets A, B of comparable size to have a large amount of “shared additive 
structure” (e.g. A and B are progressions with the same step size v); we invite the 
reader to devise analogs of the above criteria to capture this concept. 

Making the above heuristics precise and rigorous will require some work, and 
in fact will occupy large parts of Chapters 2, 3, 4, 5, 6. In deriving these basic tools 
of the field, we shall need to develop and combine techniques from elementary 
combinatorics, additive geometry, harmonic analysis, and graph theory; many of 
these methods are of independent interest in their own right, and so we have devoted 
some space to treating them in detail. 

Of course, a “typical” additive set will most likely behave like a random additive 
set, which one expects to have very little additive structure. Nevertheless, it is a 


xiv Prologue 


deep and surprising fact that as long as an additive set is dense enough in its ambi- 
ent group, it will always have some level of additive structure. The most famous 
example of this principle is Szemerédi’s theorem, which asserts that every subset 
of the integers of positive upper density will contain arbitrarily long arithmetic 
progressions; we shall devote all of Chapter 11 to this beautiful and important the- 
orem. A variant of this fact is the very recent Green—Tao theorem, which asserts 
that every subset of the prime numbers of positive upper relative density also con- 
tains arbitrarily long arithmetic progressions; in particular, the primes themselves 
have this property. If one starts with an even sparser set A than the primes, then it 
is not yet known whether A will necessarily contain long progressions; however, 
if one forms sum sets such as A+ A, A + A + A, 2A — 2A, FS(A) then these 
sets contain extraordinarily long arithmetic progressions (see in particular Section 
4.7 and Chapter 12). This basic principle — that sumsets have much more addi- 
tive structure than general sets — is closely connected to the equivalences between 
the various types of additive structure mentioned previously; indeed results of the 
former type can be used to deduce results of the latter type, and conversely. 

We now describe some other topics covered in this text. In Chapter | we recall 
the simple yet powerful probabilistic method, which is very useful in additive 
combinatorics for constructing sets with certain desirable properties (e.g. thin 
additive bases of the integers), and provides an important conceptual framework 
that complements more classical deterministic approaches to such constructions. 
In Chapter 6 we present some ways in which graph theory interacts with additive 
combinatorics, for instance in the theory of sum-free sets, or via Ramsey theory. 
Graph theory is also decisive in establishing two important results in the theory 
of sum sets, the Balog—Szemerédi—Gowers theorem and the Pliinnecke inequal- 
ities. Two other important tools from graph theory, namely the crossing number 
inequality and the Szemerédi regularity lemma, will also be covered in Chapter 
8 and Sections 10.6, 11.6 respectively. In Chapter 7 we view sum sets from the 
perspective of random walks, and give some classical and recent results concerning 
the distribution of these sum sets, and in particular recent applications to random 
matrices. Last, but not least, in Chapter 9 we describe some algebraic methods, 
notably the combinatorial Nullstellensatz and Chevalley—Waring type methods, 
which have led to several deep arithmetical results (often with very sharp bounds) 
not obtainable by other means. 


Acknowledgements 


The authors would like to thank Shimon Brooks, Robin Chapman, Michael 
Cowling, Andrew Granville, Ben Green, Timothy Gowers, Harald Helfgott, Martin 
Klazar, Mariah Hamel, Vsevolod Lev, Roy Meshulam, Melvyn Nathanson, Imre 
Ruzsa, Roman Sasyk, and Benny Sudakov for helpful comments and corrections, 


Prologue XV 


and to the Australian National University and the University of Edinburgh for their 
hospitality while portions of this book were being written. Parts of this work were 
inspired by the lecture notes of Ben Green [144], the expository article of Imre 
Ruzsa [297], and the book by Melvyn Nathanson [257]. TT is also particularly 
indebted to Roman Sasyk and Hillel Furstenberg for explaining the ergodic the- 
ory proof of Szemerédi’s theorem. VV would like to thank Endre Szemerédi for 
many useful discussions on mathematics and other aspects of life. Last, and most 
importantly, the authors thank their wives, Laura and Huong, without whom this 
book would not be finished. 


General notation 


The following general notational conventions will be used throughout the book. 


Sets and functions 
For any set A, we use 
Af := Ax- x A= {(a1,..., ad): a1,...,44 € A} 


to denote the Cartesian product of d copies of A: thus for instance Zf is the d- 
dimensional integer lattice. We shall occasionally denote A? by A®, in order to 
distinguish this Cartesian product from the d-fold product set Ad = A-...- A of 
A, or the d-fold powers A^d := {af : a € A} of A. 

If A, B are sets, we use A\B := {a € A: a ¢ B} to denote the set-theoretic 
difference of A and B; and B4 to denote the space of functions f : A > B from 
A to B. We also use 24 := {B : B C A} to denote the power set of A. We use |A| 
to denote the cardinality of A. (We shall also use |x| to denote the magnitude of a 


real or complex number x, and |v| = ,/ v? ++ v7 to denote the magnitude of 


a vector v = (v1, ..., Vq) in a Euclidean space R’. The meaning of the absolute 
value signs should be clear from context in all cases.) 

If A C Z, we use l4 : Z — {0, 1} to denote the indicator function of A: thus 
14(x) = 1 when x € A and 14(x) = 0 otherwise. Similarly if P is a property, 
we let I(P) denote the quantity | if P holds and 0 otherwise; thus for instance 
14(x) = I(x € A). 

We use (/) = wea to denote the number of k-element subsets of an n-element 
set. In particular we have the natural convention that (£) =0ifk>nork <0. 


Number systems 


We shall rely frequently on the integers Z, the positive integers Z* := {1, 2, ...}, 
the natural numbers N := Zso = {0,1,...}, the reals R, the positive reals 


XVi Prologue 


Rt := {x € R : x > 0}, the non-negative reals Ryo := {x € R: x > 0}, and the 
complex numbers C, as well as the circle group R/Z := {x + Z: x € R}. 

For any natural number N € N, we use Zy := Z/NZ to denote the cyclic group 
of order N, and use n +> n mod N to denote the canonical projection from Z to 
Zy. If q is a prime power, we use F} to denote the finite field of order q (see 
Section 9.4). In particular if p is a prime then F, is identifiable with Z,. 

If x is areal number, we use |x | to denote the greatest integer less than or equal 
to x. 


Landau asymptotic notation 


Let n be a positive variable (usually taking values on N, Zt, Ro, or R*, and often 
assumed to be large) and let f(n) and g(n) be real-valued functions of n. 


g(n) = O(f(n)) means that f is non-negative, and there is a positive constant 
C such that |g(n)| < Cf(n) for all n. 

g(n) = Q(f(n)) means that f, g are non-negative, and there is a positive 
constant c such that g(n) > cf (n) for all sufficiently large n. 

g(n) = ©(f(n)) means that f, g are non-negative and both g(n) = O(f(n)) 
and g(n) = Q(f(n)) hold; that is, there are positive constants c and C such that 
cf(n) => g(n) > Cf(n) for all n. 

g(n) = On s00(f(n)) means that f is non-negative and g(n) = O(a(n) f (n)) for 
some a(n) which tends to zero as n —> ov; if f is strictly positive, this is 
equivalent to limp, g(n)/f(n) = 0. 

g(N) = @n>œl( f (n)) means that f, g are non-negative and f(n) = On+.0(g(n)). 


In most cases the asymptotic variable n will be clear from context, and we shall 
simply write 0;.00(f (n)) as o( f (n)), and similarly write @y..0(f (n)) as œ( f (n)). 
In some cases the constants c,C and the decaying function a(n) will depend on 
some other parameters, in which case we indicate this by subscripts. Thus for 
instance g(n) = Ox(f(n)) would mean that g(n) < C f(n) for all n, where Cp 
depends on the parameter k; similarly, g(n) = On—>oo;k( f (n)) would mean that 
g(n) = O(a(n) f (n)) for some a(n) which tends to zero as n —> oo for each 
fixed k. 

The notation g(n) = Õ( f(n)) has been used widely in the combinatorics and 
theoretical computer science community in recent years; g(n) = O( f(n)) means 
that there is a constant c such that g(n) < f(n) log‘ n for all sufficiently large n. 
We can define, in a similar manner, Q and 0, though this notation will only be 
used occasionally here. Here and throughout the rest of the book, log shall denote 


the natural logarithm unless specified by subscripts, thus log, y = ee f 





Prologue xvii 


Progressions 


We have already encountered the concept of a generalized arithmetic progression. 
We now make this concept more precise. 


Definition 0.2 (Progressions) For any integers a < b, we let [a, b] denote the 
discrete closed interval [a, b] := {n € Z :a < n < b}, similarly define the half- 


open discrete interval [a, b), etc. More generally, if a = (a, ..., aq) and b = 
(bi, ..., bg) are elements of Z? such that a j < bj, we define the discrete box 
[a, b] := {(m,...,ng) € Z? :aj <nj<bjforal l < j <d}, 


and similarly 
[a, b) := {(m,...,Ma) € Z! : aj <nj <b; foralll < j <d}, 


etc. If Z is an additive group, we define a generalized arithmetic progression (or 
just progression for short) in Z to be any set! of the form P = a + [0, N] - v, 
where a € Z, N =(N,,..., Ng) is a tuple, [0, N] C Zf is a discrete box, v = 
(vi, ..., vg) € Zf, the map. : Z4 x Z4 —> Z is the dot product 


(ni, ..., Nd): (V1, ..., Va) := N1 V1 ++- + nada, 
and [0, N] - v := {n - v : n € [0, N]}. In other words, 
P = {a + nv +: + nava: 0 <n; < N; foralll < j <d}. 


We call a the base point of P, v = (v4, . . . , va) the basis vectors of P, N the dimen- 
sion of P, d the dimension or rank of P, and vol(P) := |[0, N]| = 1; +1) 
the volume of P. We say that the progression P is proper if the map n +> n- vis 
injective on [0, N], or equivalently if the cardinality of P is equal to its volume 
(as opposed to being strictly smaller than the volume, which can occur if the basis 
vectors are linearly dependent over Z). We say that P is symmetric if —P = P; 
for instance [-N, N] - v = —N - v + [0, 2N] - v is asymmetric progression. 


Other notation 


There are a number of other definitions that we shall introduce at appropriate junc- 
tures and which will be used in more than one chapter of the book. These include 
the probabilistic notation (such as EQ), PO, IO, Var), Cov()) that we introduce 


1 Strictly speaking, this is an abuse of notation; the arithmetic progression should really be the 
sextuple (P, d, N, a, v, Z), because the set P alone does not always uniquely determine the base 
point, step, ambient space or even length (if the progression is improper) of the progression P. 
However, as it would be cumbersome continually to use this sextuple, we shall usually just P to 
denote the progression. 


XViil Prologue 


at the start of Chapter 1, and measures of additive structure such as the doubling 
constant o[A] (Definition 2.4), the Ruzsa distance d(A, B) (Definition 2.5), and 
the additive energy E(A, B) (Definition 2.8). We also introduce the concept of a 


partial sum set A T B in Definition 2.28. The Fourier transform and the averaging 
notation Eyez f(x), PzA is defined in Section 4.1, Fourier bias ||A||„ is defined 
in Definition 4.12, Bohr sets Bohr(S, p) are defined in Definition 4.17, and A(p) 
constants are defined in Definition 4.26. The important notion of a Freiman homo- 
morphism is defined in Definition 5.21. The notation for group theory (e.g. ord(x) 
and (x)) is summarized in Section 3.1, while the notation for finite fields is sum- 
marized in Section 9.4. 


1 





The probabilistic method 


In additive number theory, one frequently faces the problem of showing that a 
set A contains a subset B with a certain property P. A very powerful tool for 
such a problem is Erdős’ probabilistic method. In order to show that such a subset 
B exists, it suffices to prove that a properly defined random subset of A satis- 
fies P with positive probability. The power of the probabilistic method has been 
justified by the fact that in most problems solved using this approach, it seems 
impossible to come up with a deterministically constructive proof of comparable 
simplicity. 

In this chapter we are going to present several basic probabilistic tools together 
with some representative applications of the probabilistic method, particularly 
with regard to additive bases and the primes. We shall require several standard 
facts about the distribution of primes P = {2, 3, 5, ...}, so as not to disrupt the 
flow of the chapter we have placed these facts in an appendix (Section 1.10). 


Notation. We assume the existence of some sample space (usually this will be 
finite). If E is an event in this sample space, we use P(E) to denote the probability 
of E, and I(E) to denote the indicator function (thus I(£) = 1 if E occurs and 0 
otherwise). If E, F are events, we use E A F to denote the event that E, F both 
hold, E v F to denote the event that at least one of E, F hold, and E to denote the 
event that E does not hold. In this chapter all random variables will be assumed to 
be real-valued (and usually denoted by X or Y ) or set-valued (and usually denoted 
by B). If X is a real-valued random variable with discrete support, we use 


E(X) := So xP(X = x) 


to denote the expectation of X, and 


Var(X) := E(|X — E(X)|*) = E(X |^) — E(X D)? 


2 1 The probabilistic method 


to denote the variance. Thus for instance 
Ed(£)) = P(E); Var(I(E)) = P(E) — P(E). (1.1) 


If F is an event of non-zero probability, we define the conditional probability of 
another event E with respect to F by: 


P(E AF) 
P(E|F) := —— 
P(F) 
and similarly the conditional expectation of a random variable X by 
E(XI(F 
E(X|F) := POD: XO xP = x|F). 
EAF) 4 


A random variable is boolean if it takes values in {0, 1}, or equivalently if it is an 
indicator function I(E) for some event E. 


1.1 The first moment method 


The simplest instance of the probabilistic method is the first moment method, which 
seeks to control the distribution of a random variable X in terms of its expectation 
(or first moment) E(X). Firstly, we make the trivial observation (essentially the 
pigeonhole principle) that X < E(X) with positive probability, and X > E(X) with 
positive probability. A more quantitative variant of this is 


Theorem 1.1 (Markov’s inequality) Let X be a non-negative random variable. 
Then for any positive real à > 0 


E(X) 


P(X =A) < a (1.2) 


Proof Start with the trivial inequality X > AI(X > à) and take expectations of 
both sides. 














Informally, this inequality asserts that X = O(E(X)) with high probability; for 
instance, X < 10E(X) with probability at least 0.9. Note that this is only an upper 
tail estimate; it gives an upper bound for how likely X is to be much larger than 
E(X), but does not control how likely X is to be much smaller than E(X). Indeed, 
if all one knows is the expectation E(X), it is easy to see that X could be as small 
as zero with probability arbitrarily close to 1, so the first moment method cannot 
give any non-trivial lower tail estimate. Later on we shall introduce more refined 
methods, such as the second moment method, that give further upper and lower 
tail estimates. 


1.1 The first moment method 3 


To apply the first moment method, we of course need to compute the expecta- 
tions of random variables. A fundamental tool in doing so is linearity of expectation, 
which asserts that 


E(c| X ee CnXn) = c&E(Xı) oE CnE(Xn) (1.3) 


whenever X1, ..., X, are random variables and c1, ..., Cn are real numbers. The 
power of this principle comes from there being no restriction on the independence 
or dependence between the X;s. A very typical application of (1.3) is in estimating 
the size |B| of a subset B of a given set A, where B is generated in some random 
manner. From the obvious identity 


|B| = X Ia € B) 
acA 
and (1.3), (1.1) we see that 
E(|B|) = X Pa € B). (1.4) 
acA 


Again, we emphasize that the events a € B do not need to be independent in order 
for (1.4) to apply. 
A weaker version of the linearity of expectation principle is the union bound 


P(E, V---V En) < P(E) +- + P(En) (1.5) 
for arbitrary events E1, ..., En (compare this with (1.3) with X; := I(E;) and 
ci := 1). This trivial bound is still useful, especially in the case when the events 
E1, ..., En are rare and not too strongly correlated (see Exercise 1.1.3). A related 


estimate is as follows. 


Lemma 1.2 (Borel-Cantelli lemma) Let E1, E2,...be a sequence of events 
(possibly infinite or dependent), such that X, P(E;) < œ. Then for any integer 
M, we have 


yen PE) 
P(Fewer than M of the events E1, Er, ...hold) > 1 — =u 


In particular, with probability 1 at most finitely many of the events E1, E2, ... hold. 


Another useful way of phrasing the Borel—Cantelli lemma is that if F1, F2,... 
are events such that $`, (1 — P(F,,)) < 00, then, with probability n, all but finitely 
many of the events F„ hold. 


Proof By monotone convergence it suffices to prove the claim when there are 
only finitely many events. From (1.3) we have EQ I(Z,)) = >, P(E,). If one 
now applies Markov’s inequality with à = M, the claim follows. 














4 1 The probabilistic method 


1.1.1 Sum-free sets 


We now apply the first moment method to the theory of sum-free sets. An additive 
set A is called sum-free iff it does not contain three elements x, y, z such that 
x + y =z; equivalently, A is sum-free iff AN 2A = Ø. 


Theorem 1.3 Let A be an additive set of non-zero integers. Then A contains a 
sum-free subset B of size |B| > |A|/3. 


Proof Choose a prime number p = 3k + 2, where k is sufficiently large so that 
A C[—p/3, p/3]\{O}. We can thus view A as a subset of the cyclic group Zp 
rather than the integers Z, and observe that a subset B of A will be sum-free in Z, 
if and only if! it is sum-free in Z. 

Now choose a random number x € Z,\{0} uniformly, and form the random set 


B:=AN(x-[kK4+1,2k +1) ={ae A:x lac {k+1,...,2k+1}}. 


Since [k + 1, 2k + 1] is sum-free in Z,, we see that x - [k + 1, 2k + 1] is too, 
and thus B is a sum-free subset of A. We would like to show that |B| > |A|/3 
with positive probability; by the first moment method it suffices to show that 
E(|B|) > |A|/3. From (1.4) we have 


E(\B|) = $ Pa € B)= YP la € [k + 1, 2k + 1). 


acA acA 


Ifa € A, then a is an invertible element of Z,, and thus x!a is uniformly dis- 


tributed in Z,\{O}. Since |[k + 1, 2k + 1]| > pa we conclude that P(x~!a € 


[k +1,2k + 1]) > t for alla € A. Thus we have E(|B|) > ll as desired. 














Theorem 1.3 was proved by Erdős in 1965 [86]. Several years later, Bour- 
gain [37] used harmonic analysis arguments to improve the bound slightly. It is 
surprising that the following question is open. 


Question 1.4 Can one replace n/3 by (n/3) + 10? 


Alon and Kleiman [10] considered the case of more general additive sets (not 
necessarily in Z). They showed that in this case A always contains a sum-free 
subset of 2|A|/7 elements and the constant 2/7 is best possible. 

Another classical problem concerning sum-free sets is the Erdds—Moser prob- 
lem. Consider a finite additive set A. A subset B of A is sum-free with respect to 
Aif2*BM A = Ø, where 2*B = {b; + b2|b1, b2 € B, bı Æ b2}. Erdős and Moser 
asked for an estimate of the size of the largest sum-free subset of any given set A 
of cardinality n. We will discuss this problem in Section 6.2.1. 


1 This trick can be placed in a more systematic context using the theory of Freiman homomorphisms: 
see Section 5.3. 


1.1 The first moment method 5 


Exercises 


1.1.1 


1.1.2 
1.1.3 


If X is a non-negative random variable, establish the identity 
Ba) = f ra > h)dr (1.6) 
and more generally for any 0 < p < oo 
E(X?) = rf APTIP(X > A) dA. (1.7) 


Thus the probability distribution function P(X > A) controls all the 
moments E(X?) of X. 

When does equality hold in Markov’s inequality? 

If E1, ..., En are arbitrary probabilistic events, establish the lower bound 


PEV V En) > > PED- > PUB; A Ej); 
i=1 l<i<j<n 
this bound should be compared with (1.5), and can be thought of as a vari- 
ant of the second moment method which we discuss in the next section. 
(Hint: consider the random variable }~"_, I(E;) — ee jen WE ME ;j).) 
More generally, establish the Bonferroni inequalities 


PELV---VEn)> $, CDP (A zi) 


Ac[l,n]:1<|A|<k icA 


when Ń is even, and 


PEV En D GDP (A zi) 
ACI, n]:1S|Al<k icA 

when k is odd. 
Let X be a non-negative random variable. Establish the popularity princi- 
ple E(XI(X > $E(X))) > $E(X). In particular, if X is bounded by some 
constant M , then P(X > SE(X ) > E(X ). Thus while there is in gen- 
eral no lower tail estimate on the event X < SE(X ), we can say that the 
majority of the expectation of X is generated outside of this tail event, 
which does lead to a lower tail estimate if X is bounded. 
Let A, B be non-empty subsets of a finite additive group Z. Show that 
there exists an x € Z such that 


aoe iei ee ly, 
IZ] Z] Z] 





6 1 The probabilistic method 


and a y € Z such that 


aa i 2) 
zZ o S iz) 





1.1.6 | Consider a set A as above. Show that there exists a subset {v,,..., vg} of 
Z with d = O(log 1) such that 


|A + [0, 1]? -(u1,..., va)| = |Z1/2. 


1.1.7 Consider a set A as above. Show that there exists a subset {v,,..., vg} of 
Z with d := O(log Gt + log log(10 + |Z|)) such that 


A+ (0, 1] -(v1,..., va) = Z. 


1.2 The second moment method 


The first moment method allows one to control the order of magnitude of a random 
variable X by its expectation E(X). In many cases, this control is insufficient, and 
one also needs to establish that X usually does not deviate too greatly from its 
expected value. These types of estimates are known as large deviation inequali- 
ties, and are a fundamental set of tools in the subject. They can be significantly 
more powerful than the first moment method, but often require some assumptions 
concerning independence or approximate independence. 

The simplest such large deviation inequality is Chebyshev’s inequality, which 
controls the deviation in terms of the variance Var(X ): 


Theorem 1.5 (Chebyshev’s inequality) Let X be a random variable. Then for 
any positive À 


1 
a2" 
Proof We may assume Var(X) > 0 as the case Var(X) = 0 is trivial. From 
Markov’s inequality we have 


P(|X — E(X)| > AVar(X)'””) < (1.8) 


E(X -EX _ 1 
2Var(X) X2 





P(|X — E(X)|? > A?Var(X)) < 











and the claim follows. 





Thus Chebyshev’s inequality asserts that X = E(X) + O(Var(X )'/?) with high 
probability, while in the converse direction it is clear that |X — E(X)| > Var(X yl? 
with positive probability. The application of these facts is referred to as the second 
moment method. Note that Chebyshev’s inequality provides both upper tail and 
lower tail bounds on X, with the tail decaying like 1/7 rather than 1/2. Thus 


1.2 The second moment method 7 


the second moment method tends to give better distributional control than the 
first moment method. The downside is that the second moment method requires 
computing the variance, which is often trickier than computing the expectation. 

Assume that X = X, + --- + Xn, where X;s are random variables. In view of 
(1.3), one might wonder whether 


Var(X) = Var(X,) +---+ Var(X,). (1.9) 


This equality holds in the special case when the X;s are pairwise independent (and 
in particular when they are jointly independent), but does not hold in general. For 
arbitrary X;s, we instead have 


Var(X) =) > Var(Xi)+ J. Cov(X;, X)), (1.10) 
i=l i jell,n]:i#j 
where the covariance Cov(X;, X ;) is defined as 
Cov(X;, Xj) := E(X; — E(X;))(X; — E(X ;)) = E(X: Xj) — E(X E(X ;). 


Applying (1.9) to the special case when X = |B|, where B is some randomly 
generated subset of a set A, we see from (1.1) that if the events a € B are pairwise 
independent for all a € A, then 


Var(|B|) = ) P(a € B)— Pla € BY” (1.11) 


acA 


and in particular we see from (1.4) that 
Var(|B|) < E(|B |). (1.12) 


In the case when the events a € B are not pairwise independent, we must replace 
(1.11) by the more complicated identity 


Var(|B|)= $ P(a € B)-PaeBy+ YO Covd@e B), I(a' € B)). 
acA a,a'€A:aża' 


(1.13) 


1.2.1 The number of prime divisors 


Now we present a nice application of the second moment method to classical 
number theory. To this end, let! 


v(n) = $ Kpl») 


psn 


1 We shall adopt the convention that whenever a summation is over the index p, then p is understood 
to be prime. 


8 1 The probabilistic method 


denote the number of prime divisors of n. This function is among the most studied 
objects in classical number theory. Hardy and Ramanujan in the 1920s showed that 
“almost” all n have about log log n prime divisors. We give a very simple proof of 
this result, found by Turan in 1934 [369]. 


Theorem 1.6 Let w(n) tend to infinity arbitrarily slowly. Then 
{x € [1,n]: |v) — loglogx| > w(),/log log n}| = o(n). (1.14) 
Informally speaking, this result asserts that for a “generic” integer x, we have 
v(x) = log log x + O(/log log x) with high probability. 


Proof Let x be chosen uniformly at random from the interval {1, 2,..., n}. Our 
task is now to show that 


P(|v(x) — log log x| > w(n)./loglogn) = o(1). 
Due to a technical reason, instead of v(x) we shall consider the related quantity 
|B|, where 


B := Íp prime: p < n”, pix}. 


Since x cannot have 10 different prime divisors larger than n!/'°, it follows that 
|B| — 10 < v(x) < |B|. Thus, to prove (1.14), it suffices to show 


P(||B| — loglogn| > w(n)./Inlogn) = o(1). 


Note that loglogx = loglogn + O(1) with probability 1 — o(1). In light of 
Chebyshev’s inequality, this will follow from the following expectation and vari- 
ance estimates: 


E(|B|), Var(|B]) = loglogn + O(1). 


It remains to verify the expectation and variance estimate. From linearity of expec- 
tation (1.4) we have 


E(\B|)= >) Pol») 


p<ni/io 


while from the variance identity (1.13) we have 


Var(|B})= >) Pow- Pople’) + JŽ Covd(p|x), Hg). 


pan" p.qsn'!9:p2q 


Observe that I(p|x)I(q|x) = I(pq|x). Since P(d|x) = 4 + O(+) for any d > 1, 
we conclude that 


1 1 
P(p|x) = —+0 (=) 
p n 


and 


1.3 The exponential moment method 9 





1 1 1 1 1 1 1 
Cov(I(p|x), CIDE +0 ) ( +0 ))( +o( )) =o(-). 
Pq n p n q n n 


We thus conclude that 


and 


E(B) = Y t+ of”) 


p<n!/10 


Var(|B|)= >` G = =) + O(n), 


2 
p<ni/io P 


The expectation and variance estimates now follow from Mertens’ theorem (see 


Proposition 1.51) and the convergence of the sum }°, b 














Exercises 

1.2.1. When does equality hold in Chebyshev’s inequality? 

1.2.2 If X and Y are two random variables, verify the Cauchy—Schwarz 
inequality |Cov(X, Y)| < Var(X)'/?Var(Y)!/? and the triangle inequal- 
ity Var(X + Y)!/? < Var(X)! + Var(Y )!/?. When does equality occur? 

1.2.3 Prove (1.10). 

1.2.4 If ¢ :R — R is a convex function and X is a random variable, verify 
Jensen’s inequality E(¢(X)) < ¢(E(X)). If @ is strictly convex, when 
does equality occur? 

1.2.5 Generalize Chebyshev’s inequality using higher moments E(|X — 
E(X)|?) instead of the variance. 

1.2.6 By obtaining an upper bound on the fourth moment, improve Theorem 1.6 


to 
1 
WE e [1, N] : |v(x)— loglog N| > Kyloglog N}| = O(K~*). 


Can you generalize this to obtain a bound of O,,(K ~”) for any even integer 
m > 2, where the constant in the O() notation is allowed to depend on 
m? 


1.3 The exponential moment method 


Chebyshev’s inequality shows that if one has control of the second moment 
Var(X) = E(|X — E(X)|?), then a random variable X takes the value E(X) + 
O(AVar(X)!/2) with probability 1 — O(A~*). If one uses higher moments, one 


10 1 The probabilistic method 


can obtain better decay of the tail probability than O(A~”). In particular, if one can 
control exponential moments! such as E(e'¥) for some real parameter t, then one 
can obtain exponential decay in upper and lower tail probabilities, since Markov’s 
inequality yields 


E tX 
P(X > A) = P(e > e) < w (1.15) 
fort > 0 and à e€ R, and similarly 
E(e™* 
P(X < —1) = P(e™* > e™ < w (1.16) 


for the same range of t, 4. The quantity E(e ) is known as an exponential moment 
of X, and the function t œ> E(e’*) is known as the moment generating function, 
thanks to the Taylor expansion 


t? t3 
E(e'¥) = 1+ E(X) + HEX) ES Ew) Sesion 


The application of (1.15) or (1.16) is known as the exponential moment method. 
Of course, to use it effectively one needs to be able to compute the exponential 
moments E(e’* ). A preliminary tool for doing so is 


Lemma 1.7 Let X be a random variable with |X| < 1 and E(X) = 0. Then for 
any —1 < t < 1 we have K(e'*) < exp(t?Var(X)). 


Proof Since |tX| < 1, a simple comparison of Taylor series gives the inequality 
eX <1+tX+PX. 


Taking expectations of both sides and using linearity of expectation and the hypoth- 
esis E(X) = 0 we obtain 


E(e'¥) < 1 + t’Var(X) < exp(t?Var(X)) 











as desired. 





This lemma by itself is not terribly effective as it requires both X and ¢ to be 
bounded. However the power of this lemma can be amplified considerably when 
applied to random variables X which are sums of bounded random variables, 
X = X,+---+X,, provided that we have the very strong assumption of joint 
independence between the X,,..., Xn. More precisely, we have 


1 To avoid questions of integrability or measurability, let us assume for sake of discussion that the 
random variable X here only takes finitely many values; this is the case of importance in 
combinatorial applications. 


1.3 The exponential moment method 11 


Theorem 1.8 (Chernoff’s inequality) Assume that X1, .. . , Xn are jointly inde- 
pendent random variables where |X; — E(X;)| < 1 for alli.Set X := Xi +--+ 
Xn and let o := /Var(X ) be the standard deviation of X . Then for any à > 0 


P(X — E(X)| > Ao) < 2max (e~*"4, eP), (1.17) 


Informally speaking, (1.17) asserts that X = E(X) + O(Var(X )!/2) with high 
probability, and X = E(X) + O (1n! nVar(X)'/”) with extremely high probabil- 
ity (1 — O(n-©) for some large C). The bound in Chernoff’s theorem provides 
a huge improvement over Chebyshev’s inequality when A is large. However the 
joint independence of the X; is essential (Exercise 1.3.8). Later on we shall develop 
several variants of Chernoff’s inequality in which there is some limited interaction 
between the X;. 


Proof By subtracting a constant from each of the X; we may normalize E(X;) = 0 
for each i. Observe that P(|X| > ào) = P(X > Ao) + P(X < —Ac). By symme- 
try, it thus suffices to prove that 


P(X > Ao) < eh? (1.18) 


where t := min(A/2o, 1). 
Applying (1.15) we have 


P(X > Ao) < e PE (e™*!.-- e*"). 


Since the X; are jointly independent, so are the e’*'. Using this and Lemma 1.7 
we obtain 


E(e!*!..-e'%") = E(e™') <- E(e*") < exp(t?Var(X1))-- -exp(t?Var(X;,)). 
On the other hand, from (1.9) we have 
Var(X1) + +--+ Var(X;,) = 0°. 
Putting all this together, we obtain 


P(X > ho) < ee". 





Since t < 4/20, the claim follows. 











Now let us consider a special, but important case when X;s are independent 
boolean (or Bernoulli) variables. 


Corollary 1.9 Let X = ti +--+ + t, where the t; are independent boolean random 
variables. Then for any € > 0 


P(\X — E(X)| > €E(X)) < 2e7 mine /4.€/ ECD | (1.19) 


12 1 The probabilistic method 


Applying this with € = 1/2 (for instance), we conclude in particular that 
P(X = O(E(X))) > 1 — 2e7®%0/16, (1.20) 


Proof From (1.1) we have that |t; — E(¢;)| < 1 and Var(¢;) < E(t;). Summing 
this using (1.3), (1.9), we conclude that Var(X) < E(X) (cf. (1.12)). The claim 
now follows from Theorem 1.8 with à := €E(X)/o. 














As an immediate consequence of Corollary 1.9 and (1.4) we obtain the following 
concentration of measure property for the distribution of certain types of random 
sets. 


Corollary 1.10 Let A be a set (possibly infinite), and let B C A be a random 
subset of A with the property that the events a € B are independent for every 
a € A. Then for any € > 0 and any finite A’ C A we have 


È (is ú al K X Pal = E > pa) = Jem minle?/4,€/2) Vaca’ Pa 


acA’ acA’ 


where pa := P(a € B). In particular 


nee -pey Pa/ 16 
P(n senas a)i- aca! Pi . 


acA’ acA’ 


1.3.1 Sidon’s problem on thin bases 


We now apply Chernoff’s inequality to the study of thin bases in additive combi- 
natorics. 


Definition 1.11 (Bases) Let B C N be an (infinite) set of natural numbers, and 
let k € Z}. We define the counting function rg g(n) for any n € N as 


re B(n) = [{(b1,..., bk) € BY bi +- + bk = n}. 


We say that B is a basis of order k if every sufficiently large positive integer can be 
represented as sum of k (not necessarily distinct) elements of B , or equivalently if 
rg B(n) > | for all sufficiently large n. Alternatively, B is a basis of order k if and 
only if N\kB is finite. 


Examples 1.12 The squares N^2 = {0, 1, 4, 9, ...} are known to be a basis of 
order 4 (Legendre’s theorem), while the primes P = {2,3,5,7,...} are con- 
jectured to be a basis of order 3 (Goldbach’s conjecture) and are known to 
be a basis of order 4 (Vinogradov’s theorem). Furthermore, for any k > 1, the 
kth powers N^k = {0+, 1*, 2%, ...} are known to be a basis of order C(k) for 
some finite C (k) (Waring’s conjecture, first proven by Hilbert). Indeed in this 
case, the powerful Hardy—Littlewood circle method yields the stronger result that 


1.3 The exponential moment method 13 


Tm Nak) = Om (n z!) for all large n, if m is sufficiently large depending on 
k (see for instance [379] for a discussion). On the other hand, the powers of k 
KAN = {k?, k!, k?, ...} and the infinite progression k - N = {0, k, 2k,...} are not 
bases of any order when k > 1. 


The function rz,g is closely related to the density of the set B. Indeed, we have 
the easy inequalities 


Yoram) < IB N [0, NIF < SO rea) (1.21) 
n<N n<kN 
for any N > 1; this reflects the obvious fact that ifm = bı +----+ bx is a decompo- 
sition of a natural number n into k natural numbers b4, ..., bg, then n < N implies 
that b;,..., bg € [0, N], and conversely b;,..., by € [0, N] implies n < kN. In 
particular if B is a basis of order k then 


IB A [0, N]| = QN 5). (1.22) 


Let us say that a basis B of order k is thin if rg g(n) = O (logn) for all large 
n. This would mean that |B N [0, N]| = N!/**+%, thus the basis B would be 
nearly as “thin” as possible given (1.22). In the 1930s, Sidon asked the question 
of whether thin bases actually exist (or more generally, any basis which is “high 
quality” in the sense that rg g (n) = n°") for all n). As Erdős recalled in one of his 
memoirs, he thought he could provide an answer within a few days. It took a little 
bit longer. In 1956, Erdős [92] positively answered Sidon’s question. 


Theorem 1.13 There exists a basis B C Z* of order 2 so that r2 g(n) = O(logn) 
for every sufficiently large n. In particular, there exists a thin basis of order 2. 


Remark 1.14 A very old, but still unsolved conjecture of Erdős and Turán [98] 
states that if B C N is a basis of order 2, then lim sup,,_,., 72,8 (n) = oo. In fact, 
Erdős later conjectured that lim sup,,_, 72,8 (n)/ logn > 0 (so that the thin basis 
constructed above is essentially as thin as possible). Nothing is known concerning 
these conjectures (though see Exercise 1.3.10 for a much weaker result). 


Proof Define! aset B C Z* randomly by requiring the events n € B (forn € Z+) 
to be jointly independent with probability 


logn 
P(n € B) = min {| C „1 
n 


1 Strictly speaking, to make this argument rigorous one needs an infinite probability space such as 
Wiener space, which in turn requires a certain amount of measure theory to construct. One can 
avoid this by proving a “‘finitary” version of Theorem 1.13 to provide a thin basis for an interval 
[1, N] for all sufficiently large N, and then gluing those bases together; we leave the details to the 
interested reader. A similar remark applies to other random subsets of Z+ which we shall construct 
later in this chapter. 





14 1 The probabilistic method 


where C > 0 is a large constant to be chosen later. We now show that r2,g(n) = 
© (log n) for all sufficiently large n with positive probability (indeed, it is true with 
probability 1). Writing 


T2 p(n) = Xi Ii € BI(j € By) =O ( >» lie BIn—ie ») + O(1) 


i+j=n 1<i<n/2 


we see that it suffices to show that the probability 


P ( x Ig € B)I(n—i € B) = Odogn) for all but finitely many 7 


1<i<n/2 


is positive (if the constants in the ©() notation are chosen appropiate By 
the Borel-Cantelli lemma (Lemma 1.2) and the convergence of ya 1 4 , it thus 
suffices to show that 


r( 5 Iie BIn—-ie B) = edoen)) =1-0 (<=) 
1<i<n/2 


for all large n. 
By linearity of expectation (1.3), we have for n > 1 


e( y Ii eBln—1 eB) = > oe aD $050) 


1<i<n/2 1<i<n/2 
In!” n In? i 
= 2 
-o(c ni2 os [1/2 + Oe) 
1<i<n/2 


= @(C” logn) + Oc (1). 








In particular, by choosing C large enough, we may take 


32logn < e( XO We BIn-ie ») < klogn 
1<i<n/2 
for alln > 1 and some « > 32. 
Observe that the restriction? < n/2 ensures that the boolean random variables 
Id € B)I(m — i € B) are jointly independent. If we now apply Corollary 1.9 with 
€ := 1/2, we conclude that 





r( Y We Bla i eB) = Sten] < 2/n?, 


1<i<n/2 


1.3 The exponential moment method 15 


and 





r( Y Wie Bn ie B) < = en) <2/n?. 


1<i<n/2 











The claim follows. 





It is quite natural to ask whether Theorem 1.13 can be generalized to arbitrary k. 
Using the above approach, in order to obtain a basis B such thatr;,3(n) = O(log n), 
we should set P(n € B) = cn'/*—!In!/* n for all sufficiently large n. As before, 
we have 


rk B(n) = DD I(x, € B)--- Ixy € B). (1.23) 
Xit +x =n 
Although rz, (n) does have the right expectation ©(logn), we face a major 
problem: the variables I(x; € B),..., I(x} € B) with k > 2 are no longer inde- 
pendent. In fact, a typical number x appears in quite many (Q(n‘~7)) solutions of 
xı +--+ +x = n. This dashes the hope that one can use Theorem 1.8 to conclude 
the argument. 
It took a long time to overcome this problem of dependency. In 1990, Erdős 
and Tetali [97] successfully generalized Theorem 1.13 for arbitrary k: 


Theorem 1.15 For any fixed k, there is a subset B CN such that rg gB(n) = 
O(log n) for all sufficiently large n. In particular, there exists a thin basis of order 
k for any k. 


We shall discuss this theorem later in a later section. Let us now turn instead to 
another application. 


1.3.2 Complementary bases 


Given a set A C N and an integer k > 1,aset B C N is a complementary basis of 
order k of A if every sufficiently large natural number can be written as a sum of 
an element in A and k elements in B (not necessarily distinct), or equivalently if 
N\(A + kB) is finite. 

As in the theory of bases, it is convenient to introduce the counting function 


raspy-+B(n) = |{(a, bis... bk) € A x BS :n=a+bi +--+ + dy} | 
and observe (analogously to (1.21)) that 


5 Ta+B84+-+8(0) < |AN [0, N]IIB A [0, NIK < X TA+B+--+B8 (1). 
n<N n<(k+1)N 


16 1 The probabilistic method 


Now consider the set P = {2, 3, 5, . . .} of primes, and let B be a complementary 
basis for P of order 1. Recall that |P N [0, N]| = O(7/logn) (Exercise 1.10.4 
from the Appendix (Section 1.10)). From the preceding inequality we thus have 
the lower bound 


|B [0, n]| = Qdogn) 


for all large n. It is not known whether this bound can actually be attained. However, 
Erdős showed that P has a complementary base of size O(log” n) [92, 170]: 


Theorem 1.16 P has a complementary base B C Z* of order 1 such that |B A 
[0, n]| = O(log? n) for all sufficiently large n. 


Proof Again B is created in a random manner, setting the events n € B to be 
jointly independent with probability 





; logn 
P(n € B)=min{C ,1 
n 
for some large constant C. From Corollary 1.10 we have 


1 
P(|B N [0, n]| > 10C log? n) = O (=) 
n 
(say) for each n, and hence by the Borel—Cantelli lemma (Lemma 1.2) we have 
with probability 1 that |B N [0, n]| = O(log? n) for all sufficiently large n. Thus 
it suffices to show that with probability 1, rp+g(n) > 0 for all sufficiently large n. 
By the Borel—Cantelli lemma again, it will suffice to show that 


1 
P(rpyg(n) > 0) =1-—O (=) 


for all large n. To show this, we write rp+g(n) = |B N (n — P)|. From linearity of 
expectation (1.4) we have 


log(n — p) 
E(IBN(@H-Py=C YY =F + oc). 
péPl,n) — 
We now use the estimate 
1 — 
log(n — p) = Q(logn) 


pePn[l.n) 


for all sufficiently large n (see Proposition 1.54 in the Appendix); if we choose C 
large enough, we thus conclude that 


E(|B N (n — P)|) > 8logn 


1.3 The exponential moment method 17 


for all sufficiently large n. From Corollary 1.10 (or Corollary 1.8), the desired 


claim follows. 














Exercises 


1.3.1 


Let £ be the uniform distribution on {—1, +1}, and let ¢),...,¢&, be 
independent trials of £. For any A > 0, prove the reflection principle 


J n 
P | max €; >A] = 2P Ee >A). 
(es i=l (x 


Hint: Let A C {—1, 1}” be the set of n-tuples (€1,..., E€) such that 
Y £i > À, and let B C {—1, 1}” be the set of n-tuples (€),..., En) 
such that X`;_; &; < A but S €; > A for some 1 < j < n. Create a 
“reflection map” which exhibits a bijection between A and B. 

With the same notation as the previous exercise, show that 


j n 
P (x 9 asi 2 a) < 2P 3 aie; > a) 


i=l 
for all non-negative real numbers a1, ..., dn. 
By considering the case when X,,..., X, E€ {—1, 1} are independent 
variables taking values +1 and —1 with equal probability 1/2, show that 
Theorem 1.8 cannot be improved except for the constant in the exponent. 
Let the hypotheses be as in Theorem 1.8, but with the X; complex-valued 
instead of real-valued. Show that 


E(|X — E(X)| > Ao) < 4max (e*"/8, 47/2¥?) 


for all A > 0. (Hint: if |z| > Ao, then either |Re(z)| > Fao or |Im(z)| > 
50.) The constants here can be improved slightly. 

(Hoeffding’s inequality) Let X1, ..., Xn be jointly independent random 
variables, taking finitely many values, with a; < X; < b; for all i and 
some real numbers a; < b;. Let X := X1 + -+-+ Xn. Using the expo- 
nential moment method, show that 


i=l 


si 1/2 
P | IX —E(X)| >a (x Ibi — a) Z. 


(Azuma’s inequality) Let X;,..., X, be random variables taking finitely 
many values with |X;| < 1 for all 7. We do not assume that the X; are 
jointly independent, however we do require that the X; form a martingale 
difference sequence, by which we mean that E(X;|X1 = x1,..., Xi-1 = 


18 


1.3.10 


1.3.11 


1 The probabilistic method 


xj-1) = 0 for all 1 < i < n and all x1, ... , x;—1. Using the exponential 
moment method, establish the large deviation inequality 


PUK +- + Xn] > Avn) =e" |, (1.24) 


Letn be a sufficiently large integer, and color each of the elements in [1, n] 
red or blue, uniformly and independently at random (so each element is 
red with probability 1/2 and blue with probability 1/2). Show that the 
following statements hold with probability at least 0.9: 


(a) there is a red arithmetic progression of length at least 28%; 


10” 
(b) there is no monochromatic arithmetic progression of length 


exceeding 10 logn; 
(c) the number of red elements and the number of blue elements in 

[1, n] differ by O(n'/”); 
(d) in every arithmetic progression in [1, n], the numbers of red and 

blue elements differ by O(n'/? log!” n). 
Let us color the elements of [1, n] red or blue as in the preceding exer- 
cise. For each A C [1,7], let t4 denote the parity of the red elements 
in A; thus t4 = 1 if there are an odd number of red elements in A, and 
ta = 0 otherwise. Let X = actin] ta. Show that the t4 are pairwise 
(but not necessarily jointly) independent, that E(X) = 2”7!, and that 
Var(X) = 2”~*. Furthermore, show that P(X = 0) = 2~”. This shows 
that Chernoff’s inequality can fail dramatically if one only assumes pair- 





wise independence instead of joint independence (though Chebyshev’s 
inequality is of course still valid in this case). 

For any k > 1, find a basis B C N of order k such that |B N [0, n]| = 
©,(n!/*) for all large n. (This can be done constructively, without recourse 
to the probabilistic method, for instance by taking advantage of the base 
k representation of the integers.) 

Prove that there do not exist positive integers k, m > 1, and a set B CN 
such that r, (n) = m for all sufficiently large n; thus a base of order k 
cannot be perfectly regular. (Hint: consider the complex-analytic func- 
tion _ „eg Z”, defined for |z| < 1, and compute the kth power of this 
function. It is rather challenging to find an elementary proof of this fact 
that does not use complex analysis, or the closely related tools of Fourier 
analysis.) 

With the hypotheses of Theorem 1.8, establish the moment estimates 


E(|X|?)'/? = O(./po + p) 
for all p > 1. 


1.4 Correlation inequalities 19 


1.3.12 With the hypotheses of Corollary 1.9, establish the inequality 


(G) mE 
E < —E(X) 
n n! 


for all n € N. (Hint: expand O) as D ER ti -++t,). Use this (and 
Stirling’s formula (1.52)) to derive an inequality similar to that in Corol- 
lary 1.9 in the case € > 1. For a generalization of this inequality, see 
Lemma 1.40 below. 


1.4 Correlation inequalities 


Chernoff’s inequality is useful for controlling quantities of the form t1 + --- + tn 
where ti, ..., t are independent variables. In many applications, however, one 
needs to instead control more complicated polynomial expressions of t1, ..., tn, 
such as monotone quantities. 


Definition 1.17 (Monotone increasing variables) Let 1,,...,¢, be jointly 
independent boolean random variables. A random variable X = X(t), ..., tn) is 
monotone increasing if we have 


X(t... tn) > X(t,- t) whenever t; > t; foralll <i <n 


E 


or equivalently if X is monotone increasing in each of the variables t; separately. 
We call X monotone decreasing if —X is monotone increasing. We say that an 
event A is monotone increasing (resp. decreasing) if the indicator I(A) is monotone 
increasing (resp. decreasing). 


Example 1.18 If P(t,,...,¢,) is any polynomial of tı, ..., tn with non-negative 
coefficients, then P is monotone increasing and —P is monotone decreasing, and 
the event P(t), ...,¢,) = k is monotone increasing for any fixed k. 


It is reasonable to think that any two increasing (resp. decreasing) variables 
or events are, in some way, positively correlated; intuitively, if both X and Y are 
monotone increasing (resp. decreasing), then the event that X is large (resp. small) 
should boost up the chance that Y is also large (resp. small). This intuition was 
materialized by Fortuin, Kasteleyn and Ginibre [104], motivated by problems in 
statistical mechanics: 


Theorem 1.19 (FKG inequality) Let n > 0, and let X and Y be two monotone 
increasing variables. Then 


E(XY) > E(X)E(’) 


20 1 The probabilistic method 


or equivalently 

Cov(X, Y) => 0. 
The same inequality holds for the case both X and Y are monotone decreasing. 
Proof By replacing X, Y with —X, —Y if necessary, we may assume that X and 
Y are both monotone increasing. 

We use induction on n. The base case n = 0 is trivial since in this case X 
and Y are deterministic. Now assume inductively that n > 1 and the claim has 
already been proven for n — 1. We may assume that P(t, = 0) and P(t, = 1) 
are non-zero since otherwise the claim follows immediately from the induction 


hypothesis. Observe that the covariance Cov(X, Y ) is unaffected if we shift X and 
Y by constants. Thus we may normalize 


E(X|t, = 0) = EY |t, = 0) =0 (1.25) 


where E(X |t,, = 0) denotes the conditional expectation of X relative to the event 
tn = 0. By monotonicity of X, Y in the t, variable and the joint independence of 
the t; we then have 


E(X|t, = 1), E |t, = 1) > 0. (1.26) 


Observe that, conditioning on the event t, = 0, the random variables X, Y are 
monotone increasing functions of t1, ... , tn—1. Thus by the induction hypothesis 


E(XY |t, = 0) > E(X|t, = OE |t, = 0) = 0 
and similarly 
EQXY |t, = 1) > E(X|t, = DEW |t, = 1). 
By Bayes’ formula we thus have 


E(XY) = E(XY |t, = 0)P(t, = 0) + E(XY|t, = DPG, = 1) 
> E(X|t, = DEY |t, = DP, = 1). 


On the other hand, from (1.25) and another application of the total probability 
formula we have 


EXE’) = E(X|t, = DP) = DEQ |th = DP = 1). 











Since P(t, = 1) < 1, the claim now follows from (1.26). 





From (1.1) and an easy induction we have an immediate corollary to Theo- 
rem 1.19: 


Corollary 1.20 Let A and B be two increasing events, then 


P(A A B) > P(A)P(B). 


1.4 Correlation inequalities 21 


More generally, if A, ..., Ax are increasing events, then 


P(A A+ A Ax) 2 P(A1)--- PAR). 


1.4.1 Asymptotic complementary bases 


Now we are going to use the FKG inequality to prove a result of Ruzsa [293] 
concerning asymptotic complementary bases. 


Definition 1.21 (Asymptotic complementary bases) Let A C N be aset of nat- 
ural numbers and k > 1. We define the lower density o (A) and upper density o (A) 
of A to be the numbers 

; |A N [0, n)| 

im sup —————.. 


noo n 


AN[O0, 
o(A):= timing AO aE 
noo n 
If € > 0 and X CN, we say that X is a (1 — €)-complementary base of A if 


o(A+kX) > 1 — e, and that X is an asymptotic complementary base of order k 
of Aifo(A+kxX)=1. 


Theorem 1.22 /293] Let P = {2,3,5,...} be the primes. For any 0 < € < 1, 
there is an (1 — €)-complementary base X C Z* of order 1 of P with |X A 
[1, n]| = O.dogn) for all large n. 


It follows that (the proof is left as an exercise) 
Corollary 1.23 For any function a(n) tending to infinity with n, there is an asymp- 


totic complementary base X € Z* of order 1 of P with |X N[1,n]| < a(n) logn 
for all large n. 


Corollary 1.23 improves an earlier result of Kolountzakis [214], and should also 
be compared with Theorem 1.16 (note that every complementary basis is automat- 
ically an asymptotic complementary basis). Since P has density ©(n/ logn), it is 
clear that an asymptotic complementary base of P should have density Q(log n). 
Thus, Corollary 1.23 is nearly best possible. 


Proof of Theorem 1.22 The theorem follows from the following finite statement. 


Lemma 1.24 For every £ > 0, and all natural numbers n which are sufficiently 
large depending on e, there exists a set B C [n?/3, 2n?/3] with |B| = O,(logn) 
such that 


IEL, x]\(P + B)| < ex, (1.27) 


for all n3/4 <x <n. 


22 1 The probabilistic method 


The deduction of Theorem 1.22 from Lemma 1.24 is straightforward and is left 
as an exercise. To prove Lemma 1.24, we use the probabilistic method. We choose 
B C [n?3, 2n?/3] randomly, by letting the events / € B with l € [n7/3, 2n?/3] be 
jointly independent with probability 

K logn 
Pde B)= RB 


where K = K, isa large constant to be chosen later. From Corollary 1.10 we have 


1 
P(|B| > 100K logn) < — (1.28) 
n 
(say). 
Now let J := li log, n]. If j € [0, J], we say that j is good if 
2n2!3, n/2/|\(P +B) < ÊZ. 
[20° n/2\P + Bs 555 


It is easy to verify that if all the elements of [0, J] are good, then (1.27) holds 
(recall that we assume n large depending on €). In view of (1.28), it thus suffices 
to show that 


1 
P(j is good for all j € [0, J]) > —. (1.29) 
n 


Let us first estimate the probability that a single j € [0, J] is good. Fixing 
j € [0, J], we observe for each m € [2n?/3, n/2/] that 
Pim Zé P +B) = P(m — p g B forall p € m — [n7”, 2n7/*}) 
= I] Pim — p ZB) 


pem—[n?/3 ,2n?/3] 


a) 


(1 n23 
pem—[n?/3 ,2n2/3] 


K logn 
= — [72 23S 
< exp ( |P Nm [n , 2n ]| 775 ) 
where we have used the independence of the events / € B. By Lemma 1.53, we 
conclude 
Pn g P + B) < exp(—Q(K)). 


Summing this over m € [2n?/3, n/2/] and using linearity of expectation (1.4), we 
conclude 


E (|[20°,n/2/]\(P + B))) < p AKDA 


1.5 The Lovász local lemma 23 


If we choose K sufficiently large depending on £, we thus see from Markov’s 
inequality that 
1 
P(j is good) > z 


Now we come to the final and most important observation: For any fixed j, the 
event that x; is good is a monotone increasing random variable, with respect to 
indicator variables t; := I(7 € B). Thus, by Corollary 1.20, 

P(j is good for all j € [0, J]) > I] P(j is good) 
J€[0,/] 


> 27771. 


Since J = 1 Ing” + O(1) and n is assumed to be large, the claim (1.29) follows. 














Exercises 


1.4.1 Deduce Theorem 1.22, from Lemma 1.24. (Hint: the convergence of the 
geometric series 1 + g + g? +--+ for|g| < 1 may be useful at one point.) 

1.4.2 Deduce Corollary 1.23, from Theorem 1.22. 

1.4.3 Let the notation and assumptions be as in Theorem 1.19. Suppose that 
each of the independent variables t1, ... , tn attain the values 0 and 1 with 
positive probability. Show that equality holds in Theorem 1.19 if and only 
if X and Y depend on disjoint subsets of the random variables t1, ..., tn. 


1.5 The Lovasz local lemma 


Let (A;);ey bea finite collection of events in a probabilistic space; we will later view 
the index set V as the vertex set of a graph. In many situations, it is desirable to show 
that there is a chance that the complementary events (ADiey hold simultaneously, 
i.e. that P(A; ey A;) > 0. This is particularly useful when the A; are bad events 
that we would like to avoid. 

If the A; are mutually independent, then the problem is trivial, as we have 


P (A A) =] [Pd = [Ja - Pca), (1.30) 


ieV ieV veV 


which is positive if P(A;) are all strictly less than one. On the other hand, mutual 
independence is a very strong assumption which rarely holds. 

One may expect that something similar to (1.30) is still true if we allow a 
sufficiently “local” dependence among the A;s, so that we still have good control 
on P(A;) even after conditioning on most of the events A ;. This is indeed possible, 


24 1 The probabilistic method 


as shown by Lovász in 1975 in a joint paper with Erdős [93]. We present a modern 
version of this lemma as follows. 


Lemma 1.25 (Lovász local lemma) Let V be a finite set, and for eachi € V let 
A; be a probabilistic event. Assume that there is a directed graph G (V , E) (without 
loops) on the vertex set V (which is known as the dependency graph of the Ai); 
and a sequence of numbers 0 < x; < 1 for eachi € V such that the estimate 


P(ai\4i] <x [[ d-x) (1.31) 


jes GEE 


holds whenever i € V; and S C V\{i} is such that N jes Aj has non-zero proba- 
bility and (i, j) ¢ E for all j € S. Then for any disjoint S, S' C V we have 


P (A AIN A) > | [0 -x> 0. (1.32) 


ieS ieS’ ieS 


In particular we have 


P(A) > J [0 -x> 0. 
ieV ieV 

The graph G is usually referred to as the dependency graph of the A;. Note 
that (1.31) will hold if we have 

P(A) <x; [[ G-x) 
(i, JeE 

and each A; is mutually independent to all of the A; with (i, j) ¢ E and j Æ i. This 
was in fact the hypothesis stated in the original formulation of the lemma. However, 
there are situations where these rather strong mutual independence hypotheses are 
not available and one needs the full strength of Lemma 1.25. Alon and Spencer’s 
book [12] Chapter 5 contains many interesting applications. 


Proof of Lemma 1.25 We shall induce on the total cardinality |S] + |S’. If 
[S| + |S’| = 0 then S, S’ are empty, and the claim (1.32) is trivial. Now assume 
inductively that |S] + |S’| > 1, and the claim has already been proven for smaller 
values of |S| + |S'|. Note that the case |S| = 0 is trivial. To establish the claim for 
|S] > 1, it suffices to do so for the case |S | = 1. Indeed, if |S| > 1, then we can split 
S = {j} U (S\{J }) for some j € S. From the definition of conditional probability 
we have 


e(a Na) =e (a N a)e( N Ail Na) 
ies ieS’ ieS/US\{j} ieS\{j} ieS’ 


and the claim (1.32) then follows by applying the induction hypothesis to estimate 
the second factor. 


1.5 The Lovász local lemma 25 


Thus it remains to verify the |S| = 1 case of (1.32). Writing S = {i}, we reduce 
to showing that 
jes’ 
We split S’ = S1 U S2 where Sı := {j € S|(@, j) € E} are those indices j which 
are adjacent to i in the dependency graph, and S2 := S'\ S1. From the definition of 
conditional probability again we have 


2 P (Ai, ies, Aj| ies, Aj) 
P( A;| A\ 4; ) = =. 
jes! P (A jes, Aj| jess Aj) 


Aj; occurs with positive probability. From 





Note that by induction hypothesis, /\ 
(1.31) we have 


(aA ail A a) <P(ai Aa) <x [L d- x). 


JES, JES2 JES2 jEV :(i,j)€E 
On the other hand, from the induction hypothesis (since |S1| + |S2| < 1 + |S’|) we 
have 


JES2 


P(A Ai Na) >[Ja-x)2 [] a--). 


JES, JES2 jesi jeV:(i,j)€E 











Combining the two, we obtain the claim. 





In practice, the following corollary of Lemma 1.25 is sometimes easier to apply. 


Corollary 1.26 Letd > 1 and0 < p < 1 be numbers such that 
eneo 

PEF 
where e = 2.718... is the base of the natural logarithm. Let V be a finite set, and 
for eachi € V let A; be a probabilistic event with P(A;) < p. Assume also that 
each A; is mutually independent of all but at most d of the other events A;. Then 


> NLA 
Pi Ada) = (t= 0. 
F > ( a) 5 


If d = 0, then Corollary 1.26 follows from (1.30). For d > 1, the corollary 
follows from Lemma 1.25 by setting x; = 7a and using the fact that (1 — m > 
E, The constant e is best possible as shown by Shearer. 


1.5.1 Colorings of the real line 


We now give an application of Corollary 1.26. This is the original result from the 
paper [93] of Erdős and Lovász, which motivated the development of the local 
lemma. 


26 1 The probabilistic method 


Let us use k colors [1, k] to color the real numbers. (Thus, a coloring is a map 
from R to [1, k].) A subset T of R is called colorful if it contains all k colors. 


Theorem 1.27 Let m and k be two positive integers satisfying 


e(m(m — 1)+ Lk (1 — :) <l. (1.33) 


Then for any set S of real numbers with |S| = m, and any set X C R (possibly 
infinite), there is a k-coloring of R such that the translates x + S of S are colorful 
for everyx € X. 


Proof We first prove this theorem in the special case when X is finite, and then 
use a compactness argument to handle the general case (of course, the theorem is 
strongest when X = R). The point is that the bound (1.33) does not depend on the 
cardinality of X. 

Fix X to be finite; thus X + S is also finite. Note that we only need to color 
the real numbers in X + S, since the real numbers outside of X + S are irrelevant. 
For each element y in X + S, we color it randomly and independently: y receives 
each of the colors in [1, k] with the same probability 1/k. Let A, be the event that 
the translate x + S is not colorful. We need to show that 


P ( N 4, 0, 
xeX 

In order to apply Corollary 1.26, we first estimate P(A,). If xs is not colorful, 
then at least one color is missing. The probability that a particular color (say 1) is 
missing is (1 — pets =(1- iym, As there are k colors, we conclude 


1 m 
P(A,) <k (1 = z) ; 


(In fact we have a strict inequality as there is a positive chance that more than one 
color is missing.) Next, observe that if two translates x + S and x’ + S are disjoint, 
then the events A, and A,’ are independent. On the other hand, x + S and x’ € S 
intersect if and only if there are two elements s1, 5. € S such that x + sı =x’ + s2. 
It follows that x’ = x + (sı — s2). Since that number of (ordered) pairs (s1, s2) with 
Sı Æ s2 and s1, S2 E S ism(m — 1), we conclude that each A, is independent from 
all but at most m(m — 1) events Ax. Set p = k(1 — ry” and d = m(m — 1). The 
condition (1.33) guarantees that the condition of Corollary 1.26 is met and this 
corollary implies that P(A ex Ax) > 0, as desired. 

A routine way of passing from a finite statement to an infinite one is to use a 
compactness argument and that is what we do next. The space of colorings of R 
can be identified with the product space [1, k]®, which is compact in the product 
topology by Tychonoff’s theorem. In this product space, for each x € R we set 


1.6 Janson’s inequality 27 


K, to be the set of all k-colorings such that x + S is colorful. It is easy to see 
that each K, is closed. The finite statement proved above asserts that any finite 
collection of the K, has a non-empty intersection. It follows, by compactness, that 
all Ką, x € R, have a non-empty intersection. Any element in this intersection is 
a coloring desired by the theorem. 














Exercise 


1.5.1 Show that there exists a positive constant c such that the following holds. 
For every sufficiently large n, there is a graph on n points which does not 
contain the following two objects: a triangle and an independent set of 
size c,/n logn. (An independent set is a set of vertices, no two of which 
are connected by an edge.) 


1.6 Janson’s inequality 


Let t),..., tn be jointly independent boolean random variables. In Corollary 1.9 
we established a large deviation inequality for the polynomial tı + ---+ tn. In 
many applications, it is also of interest to obtain large deviation inequalities for 
more general polynomials P (ti, ..., tn) of the boolean variables tı, ..., tn. One 
particularly important case is that of a boolean polynomial 
X i= ye I] tj, 

AEA jEA 
where A is some collection of non-empty subsets of [1, n]. Observe that boolean 
polynomials are automatically positive and monotone increasing, and hence 
any two boolean polynomials are positively correlated via the FKG inequality 
(Theorem 1.19). More generally, if X and Y are boolean polynomials, then f(X) 
and f (Y) will be positively correlated whenever f is a monotone increasing or 
decreasing function. In particular, we see that 


E(e) > E(e**)E(e~”) (1.34) 


for any real number s. Using this fact, the exponential moment method, and some 
additional convexity arguments, Janson [190] derived a powerful bound for the 
lower tail probability P(X < E(X) — T): 


Theorem 1.28 (Janson’s inequality) Let tı, ..., tn, A, X be as above. Then for 
any 0 < T < E(X) we have the lower tail estimate 


T? 
P(X < E(X) —T) < exp (-3) 


28 1 The probabilistic method 
where 


Ke SS e( ll i). 
A, BEA: ANB#D jEAUB 


E(X}? 
2A ) l 


In particular, we have 





P(X = 0) < exp (- 


Remark 1.29 Informally, Janson’s inequality asserts that if A = O (E(X )?), then 
X = Q(E(X)) with large probability. In the case where A is just the collection of 
singletons {1},..., {n}, then X = ti +--+ tn, A = E(X), and the above claim 
is then essentially (the lower half of) Corollary 1.9. 


The quantity A is somewhat inconvenient to work with directly. Using the 
independence of the t;, one can rewrite it as 


s=De(I) > e( To) 
AECA jEA BEA:ANB#Ø jEB\A 


Since E(X) = J jea E( [jea tj), we thus have 


A<E(X)sup > e( I] o) (1.35) 


AEA BeA:ANB#Ø  \JEB\A 


We record a particular consequence of this estimate concerning quadratic boolean 
polynomials that we shall use shortly. 





Corollary 1.30 Let t),..., t, be as above, and let X = pa rere tit;, where 
i ~ j is some symmetric relation on [1, n]. Then we have 
P(X =0)< B®) 
— ex A 
REATI + 4sup; 20 janj EC) 


Proof We take A := {{i, j}: i ~ j}. For any A € A, it is easy to verify that 


yy e( I] ) <1+2sup XC EW) 
BEA:ANBŁØ 


jeB\A i ji~j 














and so the claim follows from (1.35) and Theorem 1.28. 


Before presenting the proof of Theorem 1.28, let us give an application. This 
application again concerns complementary bases of primes, but this time of order 2 
rather than 1. The following result (which should be compared with Theorems 1.16 
and 1.22) in the case k = 2 was recently proved by Vu [376]. 


1.6 Janson’s inequality 29 


Theorem 1.31 For any k > 2, P has a complementary base B € Z* of order k 
with |B N [1, n]| = Odogn) for all large n. 


Proof It suffices to establish the claim when k = 2. To construct B we shall again 
use the probabilistic method. More precisely, we let B C Z* be a random set with 
the events n € B being independent with probability 


P(n € B) = min ($, 1) 


for all n € Z*, where c is a positive constant to be determined. As before, we will 
not discuss the measure-theoretic issues associated with requiring infinitely-many 
independent random variables, as they can be dealt with by a suitable finitiza- 
tion of this argument. Let t, be the boolean random variable t, := I(n € B). By 
Corollary 1.10 we have 


1 
P(B O [1, m]| < 10c logm) = 1 — O (=) 
m 


for all large m, and hence by the Borel—Cantelli lemma (Lemma 1.2) we have with 
probability 1 that 


|B O[1,m]| = O-dogm) for all sufficiently large m > 1. (1.36) 
Now for each n € Z*, consider the counting function 


rpe+g+8(00) = (p, i D EeEPxBxB:n=p+i+j}| 
-D D an 
p<ni+j=n—p 


This is of course a random variable for each n. In view of (1.36), it will suffice to 
show that with probability 1, we have rp+g+g(n) 4 0 for all but finitely many n. 
From the Borel—Cantelli lemma, it thus suffices to show that 


1 
P(p+s8+8(1) =0) =O (=) 


for all large n, if c is chosen large enough. 
Fix n to be large. It will be convenient to work with a reduced version of 
rp+p+p(n), namely the boolean polynomial 


Y, = ) titj. 
i> jen?:i+jen—P 


Clearly we have Y„ < rp+g+g (n), and so it suffices to show that 


P(Y, = 0) =O (=). 
n 


30 1 The probabilistic method 


We now apply Corollary 1.30 (using the relationi ~ jifi # jandi + j en— P) 


to give 
E(Y,,) 
P(Y,, = 0) < exp : 
2+4 SUPj> 2/3 È jSne jenP E(¢;) 


By construction of the t;, and Proposition 1.54 from the Appendix, we have for 
2/3 





anyi>n 


Y EQ\= È min ( —— 1) 


jzn?b:i+jen—P psn—i-n?/3 


= O(c). 
On the other hand, from linearity of expectation (1.3) and independence, we have 


EY p= >. Etj) 


i> j>n?/3:i+jen—P 





— 2 
i>j>n°b:i+jen—P ij 
1 
42 
Le ee 
psn—2n2/3 i> j>n?2/3:i+ j=n—p 
log(n — 
a2 Ss Q ( g 2) 
2/3 n=p 
p<n—2n?/ 
= Q(c? logn), 


where in the last line we again used Proposition 1.54 from the Appendix. Putting 
all of these estimates together we obtain 


P(Y,, = 0) < exp(—Q(c logn)) 











and the claim follows by choosing c to be suitably large. 





Now we are going to prove Theorem 1.28. 


Proof of Theorem 1.28 We shall use the exponential moment method. By a 
limiting argument we may assume that P(t; = 0), P(t; = 1) > 0 for all j. We 
introduce the moment generating function F(t) := E(e~) for any t > 0. By 
(1.16) we have 

F(t) 
e EQT)’ 





P(X < E(X)-T) < 


1.6 Janson’s inequality 31 


Taking logarithms, we see that we only need to establish the inequality 


2 


log F(t) + ((E(X) — T) < -L 


for some t > 0. Unlike the situation in Theorem 1.8, the summands in X are not 
necessarily independent, so we cannot factorize F(t) = E(e~™) easily. Janson 
found a beautiful argument to get around this difficulty. Since F (0) = 1, we see 
from the fundamental theorem of calculus that 


lo ro= f SLP 
Panam ae Bi 





Direct calculation shows that 


F'(s) = —E(Xe~*) 


AeA jEA 
= — > Ee |E4)PEa), 
AeA 


where E4 is the event that t; = 1 for all j € A. Thus it suffices to show that 


E(e**|E 4) T? 
Deen f A ds- 2 
AcA 
for some t > 0. 

We now exploit the fact that some of the factors of e~** are independent of E 4. 
For each A € A, we split X as Y4 + Z4, which are the boolean polynomials 


Y; := a T[s: Za = 5 Jle 


BeA:ANBZØ jEB BeA:ANB= jeB 


—sX 


By (1.34) (conditioning on the variables in FE 4), we conclude 
E(e*|E4) > Ee" |Ea)E(e |E a). 


On the other hand, Z 4 is independent from E 4 and is bounded from above by X; 
thus 


E(e°“4|E 4) = E(e*“4) > E(e~**) = F(s). 


Combining all these estimates, we have reduced to showing that 


t 2 
Yo PEs) | EEEa ds — 1X) —T) > Z 
AEA 0 2A 


for some t > 0. 


32 1 The probabilistic method 


SX 


Next, we exploit the convexity of the function x b> e 
(Exercise 1.2.4), concluding that 


via Jensen’s inequality 


E(e"4|E,) > oe SEWalEa) 


From linearity of expectation we have )> 4. 4 P(E4) = E(X), and so another appli- 
cation of Jensen’s inequality gives 


P(E 4) 


Xo P(E 4je EME a) > E(X)e* vaca roy E(falE a) 
AeA 


On the other hand, from the definition of conditional probability we have 


> P(E,)E(Y4|Ea) = > 5 E (ren I] v) =A, 


AeA AECA BEA: ANBAD jEB 


We thus have 


DD res f E(e~"4|E 4) ds — t(E(X) — T) (1.37) 
0 


AEA 
t 
> E(X) f eSATA) ds —tŒE(X)-T) 
0 


2 
= (I = gT AEO — t(E(X)- T). (1.38) 


If we set t := T/A, then tA/E(X) = T/E(X) < 1, and we have 


t= ef A/E(X) Jif, eT /EX) 


> T/E(X) — T?/2E(X)* 





and hence 
t 
Y PEA) / E(e°"*|E 4) ds — EX) - T) 
AeA 0 
TE(X) T? T 
> Œ(X)-T) 
A 2A A 
T? 
~ 2A 
as desired. 














Remark 1.32 Choosing t = T/A might be convenient, but may not be optimal. 
One can have a slightly better bound by optimizing the right hand side of (1.38) 
over t. 


Remark 1.33 The proof of Janson’s inequality is not symmetric. In other words, 
it cannot be extended to give a bound for the upper tail probability P(X > u +T). 
This probability will be addressed in the next section. 


1.7 Concentration of polynomials 33 


Exercises 


1.6.1 By refining the argument, show that the complementary base B con- 
structed in the proof of Theorem 1.31 has (with high probability) the 
property that re+g8+8(n) = Q(log n) for all sufficiently large n. 

1.6.2 Define arandom graph G(n, p) on the vertex set [1, n] as follows. For each 
pair i, j (1 <i < j <n) draw an edge between i and j with probability 
p, independently. 

(a) Prove that if p = o(n7!), then with probability 1 — 0(1), Gin, p) 
does not contain a triangle. 

(b) Assume that p = n~'* for some small positive constant €. Bound 
the probability that G does not contain a triangle. 

1.6.3 Prove that for any k > 2 there is a basis B of order k with with |B N 
[1, n]| = O(n"? log! n) for all large n. 


1.7 Concentration of polynomials 


In previous sections, we often considered a polynomial Y = Y(t,,...,¢,) of n 
independent random variables f,, ... , tn, and wished to control the tail distribution 
of Y. For instance Chernoff’s inequality shows that the polynomial ti + --- + tn 
is concentrated around its mean, while Janson’s inequality shows that the val- 
ues of certain polynomials (especially those of low degree) could very rarely be 
significantly less than the mean. 

In this section, we present some further results of this type, that assert that 
certain polynomials with small degrees are strongly concentrated. These results 
can be seen as generalizing Chernoff’s bound, and also provide (in certain cases) 
the missing half (upper tail bound) of Janson’s inequality. 

To motivate the results, let us first give a classical result which works for any 
function Y (not just a polynomial) provided that the Lipschitz constant of Y is small. 


Lemma 1.34 (Lipschitz concentration inequality) Let Y : {0,1}" > R be a 
function such that |Y (t) — Y(t')| < K whenever t, t' € {0, 1}" differ in only one 
coordinate. Then if tı, ..., t, are independent boolean variables, we have 


POY (ti, tn) — EO (th, -nD = AK Vn) < 207? ? 
forallr > 0. 


Remark 1.35 This inequality asserts that if each ¢; can only influence the random 
variable Y(t),...,t,) by at most O(K), then Y (tı, ...,f,) itself is concentrated 
in an interval of length O(K y/n) around its mean. It should be compared with 
Hoeffding’s inequality, which deals with the case Y (t1, ..., tn) := ti +°- + tn, 
and also with Corollary 1.30. 


34 1 The probabilistic method 


Proof By dividing Y by K we may renormalize K = 1. Introduce the partially- 
conditioned random variables Yo, Y1(t1),..., Yn(ti,---,tn) =Y(h,---,tn) by 
Yj(t),...,¢;):= EY |ti, ..., tj); thus Y; is the conditional expectation of Y 
with the first j boolean variables ¢; fixed. In particular Yo = E(Y) and Y, = 
Y (ti, ..., tn). We can thus write 


Y(t, ..-5t) —EW(t,.-.5t%&)) = Xite Xn 


where X; := Y; — Y;_). One then easily verifies (using the Lipschitz property) 
that |X ;| < l and X,,..., Xn form a martingale difference sequence in the sense 
of Exercise 1.3.6. The claim then follows from Azuma’s inequality (1.24). 














The above lemma is very useful when one has uniform Lipschitz control on Y, 
for instance if Y = Y(t), ..., tn) is a polynomial for which the partial derivatives 
7 are small for all t),..., ft, in the unit cube. However in many applications 
(especially to thin bases), these partial derivatives will only be small on the average. 
Fortunately there are analogs of the above lemma which apply in this case, though 
they also require some average control on higher derivatives of Y. To state the 
results we need some notation. Let Y = Y(t,,..., tn) be a polynomial of n real 
variables. We say that Y is totally positive if all of its coefficients are non-negative, 
and furthermore that Y is regular if all the coefficients are between zero and one. We 
also say that Y is simplified if all of its monomials are square-free (i.e. do not contain 
any factor of i), and homogeneous if all the monomials have the same degree. Thus 
for instance a boolean polynomial is automatically regular and simplified, though 
not necessarily homogeneous. Given any multi-index @ = (a@,...,@,) € Z}, we 


define the partial derivative 0°Y as 


ae. aN" 
ƏY := Sa wae — Y(ti,.--,tn), 
() (>) a i 


and denote the order of a as |a| := a, + +- - + æn. For any order d > 0, we denote 

Ea ) := MaXa:jaj=a E(0%Y ); thus for instance Eo(Y ) = E(Y ), and E4 (Y ) = 0 if 

d exceeds the degree of Y. These quantities are vaguely reminiscent of Sobolev 

norms for the random variable Y . We also define Ey 4(Y) := maxyg>q Ew (Y). 
The following result is due to Kim and Vu [203]. 


Theorem 1.36 Let k > 1, and let Y = Y(t), ...,t,) be a totally positive polyno- 
mial of n independent boolean variables t,,..., tn. Then there exists a constant 
Cx > 0 depending only on k such that 


P(Y — EY )| > Cpa? /Eso(Y)E=1(Y)) = Og (e744 Dee”) 
forall à > 0. 


Informally Theorem 1.36 asserts that when the derivatives of Y are smaller on 
average than Y itself, and the degree of Y is small, then Y is concentrated around 


1.7 Concentration of polynomials 35 


its mean, and in fact we have Y = (1+ Ox( pores logt! n)E(Y) with high 
probability. 

In applications in additive number theory, we frequently deal with the case 
when Y is roughly of size logn. In this case, the error term e“~!°8” renders 
Theorem 1.36 ineffective. We, however, have a variant which is designed to handle 
this case: 


Theorem 1.37 [378] Let k,n > land B,y,¢ > 0. IFY =Y(h,...,t,) is areg- 
ular polynomial (not necessarily simplified) of n independent boolean variables 
ti,..-,tn, which is homogeneous of degree k and obeys the expectation bounds 


Q logn < E(Y) < n/Q; EY), ..., Ez) <n” 
for some sufficiently large Q = Q(k, €, B, y) (independent of n), then 
P(Y —E(Y)| > €E(Y)) < n. 


In the next section, we will use this theorem to prove Theorem 1.15. 

The next theorem deals with the case when the expectation of Y is less than one. 
In this case it is convenient to remove the constant term from any derivative of Y 
which appears. More precisely, introduce the renormalized derivative 9% Y (t) := 
“Y (t) — Ə“Y (0). 


Theorem 1.38 LetY = Y (ti, ... , tn) be a simplified regular polynomial of n inde- 
pendent boolean variables (not necessarily homogeneous) such that E(3Ə$Y ) < 
n-” for some y >Q and all a. Then, for any B >00, we have the bound 
PY > Kg) < nÊ for some Kp.) which is independent of n and Y. 


Notice that the assumption implies that Y has small expectation. Taking a to 
be all zero, we have E(Y) < n”. 

The proof of Theorem 1.36 relies on the so-called “divide and conquer 
martingale” technique, together with the exponential moment method. It is not 
too technical but requires lots of introduction. We thus skip it and refer the reader 
to [203]. The proof of Theorem 1.37 is more complicated. Besides the above- 
mentioned martingale technique, it also requires some non-trivial combinatorial 
considerations. Theorem 1.38 is a by-product of this proof (for details see [378]). 
These theorems have a wide range of applications in several areas and we refer the 
reader to [377] for a survey. 


1.7.1 B,[g] sets 


Let us conclude this section by an application of Theorem 1.38. A set A C N is 
called a B,[g] set or a B,[g] sequence if for any positive integer m, the equation 
m = xi +--+ + Xh, X1 < XS +++ < Xp, Xi E A, has at most g solutions; up to a 


36 1 The probabilistic method 


factor of h!, this is equivalent to requiring that r;,,4(m) be bounded by g for all m. 
B,[g] sets were studied by Erdős and Turan in [98]. From (1.21) we see that if A 
is a B,[g] set, then |A N [0, n]| = On,g(n'/") for all n. In the converse direction, 
Erdős and Turan proved 


Theorem 1.39 For any h > 1 and e > 0, there exists a set A C Z* with |AN 
[0, n]| = Q,(n!/"-*) for all large n, which is a By|g] set for some g = gn._ (or in 
other words, r, a(n) is uniformly bounded in n). 


Proof By using Theorem 1.38 we can give a short proof of this theorem. As 
before, we construct A randomly, letting the events n € A be independent with 
probability P(n € A) = n'/"-!~©, A simple application of Corollary 1.9 and the 
Borel—Cantelli lemma also gives |A N [0, n]| = Rpr e(n!” he) for all but finitely 
many n with probability 1. Thus it will suffice to show that A is a Ba [g] set with 
probability 1 (perhaps after removing finitely many elements), for some suitably 
large g = g depending only on A. 

Let t, denote the indicator variables t, := I(n € A). For each m, we observe 
that the random variable 


Yn = Yaire tm) = 5 thy ++ tn, 
NS Sg ny +e +n =m 
will become a regular polynomial of degree h in the t1, ..., tm once we use the 
identity tf = t; fora = 2,3, ... to make the monomials square-free. To show that 


A is a B;[g] set after removing finitely many elements, it will suffice to show that 
Ym < g for all but finitely many m; by the Borel—Cantelli lemma, it is enough to 
establish the upper tail estimate 


Pn > 8) <m? 


for all large m. From linearity of expectation and independence we have 


1/h—1— 1/h—1— 
EY) = x nY aT ny’ € 


ni < <np:ni t +n =m 


1/h—1—e 1/h—1—e I/h—1-e 
<On{m > ny EES P 
ny 


Sg nhp—1 <M 


h-1 
< m"! =O, YE NATIE 
n<m 
< Onn"). 
This already gives some non-trivial bound on P(Y„ > g) from Markov’s inequality, 
but does not give the required decay in m. However, a similar computation to the 


1.8 Thin bases of higher order 37 


above (which we leave as an exercise) establishes that E(9% Y„) = O,(m—"/") for 
all non-zero a. The claim now follows from Theorem 1.38. 














The study of B,[g] sets is a popular topic in additive combinatorics. A detailed 
discussion of this topic is beyond the scope of our book. Let us, however, mention 
one new result of Cilleruelo, Ruzsa and Trujillo from [62]. Many other recent 
results can be found in [62, 191, 213, 61, 145, 272]. 

Let A C [1, N] be a B,[g] set. A simple counting argument (related to (1.21)) 
gives (4!*"~!) < ghN, which in turn yields the trivial bound |A| < (ghh!N)'/", 
Cilleruelo, Ruzsa and Trujillo gave the first non-trivial bounds for the case g > 2. 
They prove that |A| < 1.864(gN)!/? + 1 when h = 2, and that 


Fi(g, N) < (1 + cos” (a /h)7 Y} (hh!g N)" 


when h > 2. The proofs made use of harmonic analysis methods via the con- 
sideration of the trigonometric polynomials f(t) = } „ex e“. The authors also 
constructed sets to establish for any g, the existence of a B2[g] set A C [1, N] 
with 
2 
Al > ( g + [8/2] 


+ 0,(1))N'?. 
Vertaal + ) 


Exercises 


1.7.1. Consider the random graph G(n, p) defined in Exercise 1.6.2, and set 
p :=n'*€, Let Y be the number of triangles in G(n, p). Give an upper 
bound and a lower bound for 


3 
P(r > SE) : 


1.7.2 Verify the bound E(9% Ym) = O, (n7 1/h) claimed in the Proof of Theorem 
1.39. 


1.8 Thin bases of higher order 


We now return to the study of thin bases B and their associated counting functions 
rg, B(n), initiated in Section 1.3. However, in this section we can use Theorem 1.37 
to present a proof of Theorem 1.15, which asserted for each k > 1 the existence 
of a base B of order k with ry,3(n) = O;(logn) for all large n. This was proven 
in the k = 2 case (see Theorem 1.13) using Chernoff’s inequality, but that method 
does not directly apply for higher k because rg, g (n) cannot be easily expressed as 
the sum of independent random variables. 


38 1 The probabilistic method 


We begin with a simple lemma on boolean polynomials that shows that if E(X) 
is not too large, then at most points (t1, . . . , tn) of the sample space, the polynomial 
X does not contain too many independent terms (cf. Exercise 1.3.12). 


Lemma 1.40 Let X = do 424] je4 tj be a boolean polynomial of n independent 
boolean variables t,,...,tn, let B C [1, n] be the random set B := {j € [1,n]: 
t; = 1}, and let D € N be the random variable, defined as the largest number of 
disjoint sets in A which are contained in B. Then for any integer K > 1 we have 
E(X)“ 
P(D > K) < f 
K! 


Proof Observe that for A4, ..., Ag disjoint, 


D =K) < Š 5 TESSIE 


` Ai, Ag EA,disjoint jEA1 jJEAk 





Taking expectations of both sides and using linearity of expectation (1.3) followed 
by independence, we conclude 


1 
POSK 5 e(o) e(Te) 
* Ai, ARKEA JEAL JEAg 


But by linearity of expectation again, the left-hand side is just E(X )* /K !, and the 
claim follows. 














This lemma is particularly useful when combined with the sunflower lemma 
of Erdős and Rado [95]. A collection of sets A,,..., A; forms a sunflower if the 
pairwise intersections A; N A; for i # j are all the same (the A; are called the 
petals of the flower). We allow this common pairwise intersection to be empty. 


Lemma 1.41 (Sunflower lemma) /f A is a collection of sets, each of size at most 
k, and |A| > (l — 1)*k!, then A contains | sets forming a sunflower. 


This lemma can be proven by elementary combinatorics and is left as an exer- 
cise. It has the following consequence for the counting function rg, g (n). 


Corollary 1.42 Let B C Z* andk > 2,andfor eachn € Z* let Dg n be the largest 
number of disjoint multisets* {x,, . . . , xx} of elements of B which sum to n. Then 


m<n 


k 
rg B(n) < k!k“ max (2... (sup rk-1,B(M) — 1) ‘ 


Proof Fix n, and consider the collection A of sets which arise from taking the 
multisets {x1, . . . , x} of elements of B which sum to n and then removing repeated 


2 A multiset is a set which is allowed to have repeated elements 


1.8 Thin bases of higher order 39 


elements. Clearly rg g (n) < k*| A|. Also observe that any sunflower in A has car- 
dinality at most D;., (if the petals are disjoint) or sup, <n ”k-1,8 (M) (if the petals 
are not disjoint); the latter follows by taking one of the elements in the common 
intersection of the sunflower and removing it once from each of the associated 
multisets. The claim then follows from the sunflower lemma. 














Using the above methods, we can now give a preliminary result towards proving 
Theorem 1.15. 


Proposition 1.43 Let k > 2, and let B C Z* be a random subset of Z*, defined 
by letting x € B be independent with probability 


P(x € B) = min(Cx'/*"' log!’* x, 1) 


for some positive constant C > 1. Then with probability 1, we have sup, ry, B(n) = 
Oc. Bl) for all 1 <k <k. 


Proof We induce on k. The case k = 1 is obvious. Now suppose that 1 < k’ < k 
and the claim has already been proven for k’ — 1. Applying Corollary 1.42, we 
conclude that, with probability 1, 


k 
rv.) = Ocxw.s (Den +1)“ ). (1.39) 
On the other hand, if we apply Lemma 1.40 with t, := I(x € B) for 1 < x <n, 
and A = A, equal to all the sets which arise from the multisets {x,, . . . , xw } that 


sum to n, then we observe that 
K 


E( D Aes, Ljea tj) 
K! 

for any K € Z*. However, from linearity of expectation (1.3) and independence 

we have 


e(5 Ts) 


AEAn JEA 





P(Dy.n 2K) < 


5 | [ min (Cy log" J, 1) 


AEAn JEA 


< Ocke ( 2 A a) logn 


AS Siete Ai =n 


A/k-1 A/k-1 . 
< ocne ( 5 arn y” l logn 
J 


jis- Jw- LL] 


k'—1 
= OC,k,k' ( > pe nik} logn 


jell,n] 


= OC,k,k' (nt! log n). 


40 1 The probabilistic method 


Since k’ < k, we thus see that, by choosing K depending on k sufficiently large 
(e.g. K = 2k + 1), we have 


1 
PODe n = K) = Oc kk K (>) ; 


Applying the Borel—Cantelli lemma (Lemma 1.2) we see that with probability 1, 
we have Dy, < K for all but finitely many n. Combining this with (1.39) we 
obtain the claim. 














Now we prove Theorem 1.15. It will suffice to show that 


Proposition 1.44 Let k > 2, and let B C Z* be a random subset of Z*, defined 
by letting x € B be independent with probability 


P(x € B) = min(Cx!/*! Jog!/* x, 1) 


for some positive constant C > 1. If C is sufficiently large depending on k, then 
with probability 1, we have rg g(n) = Oc, (logn) for all but finitely many n. In 
particular, B is a thin basis of order k with probability 1. 


Proof We shall estimate rz, g (n) in terms of two related expressions: 
R(n) := {(x1,..., X) E€ B ay +e x = nn! < X1 < X2 <- < xx} 
(1.40) 
E(n) := {(1,.--, X) E B: x1 +e ax = 3x1 = x2 orx <n}. (1.41) 
It is clear (using the symmetry of x; + ----+ x, under permutations) that 
k!R(n) < rg, B(n) < k!R(n) + k°E(n). 


We view R(n) as the main term and £ (n) as the error term; this reflects the intuitive 
fact that for most representations n = xı +---+ xg, the x; will be distinct and 
comparable in magnitude to n. It will suffice to show that with probability 1 we 
have 


E(n) = Oc, B (1); R(n) = Oc,k,g (logn) 


for all but finitely many n. 

Let us deal first with the error term E (n). We argue as in the proof of Proposition 
1.43. Let A, denote those sets which arise from the multisets {x,,--- ,x,} with 
xy +- +x = nandeitherx, = x. orx,; < n®!, By arguing as in Corollary 1.42, 
we have 


k 
E(n) < k!k* max (>. (supreso — i) 


m<n 


1.8 Thin bases of higher order 41 


where D, is the largest number of disjoint sets that one can find in A,. Applying 
Proposition 1.43, we conclude that 


E(n) = Oc,k,B(Dn + 1) 
with probability 1. On the other hand, from Lemma 1.40, we have for any K that 
EQ Wek [jes t;)* 


K! 
By arguing as in Proposition 1.43, one can establish 


E (2 I] r) < O(n! En’ logn) 


AEAn JEA 





P(D, > K) < 


and thus, for a suitably large constant K depending only on k, 
P(D,, > K) = Ok(l/n?). 
From the Borel—Cantelli lemma we conclude that, with probability 1, 
E(n) = Oc,x,8(1) 


for all but finitely many n, and so the contribution of E (n) is negligible. 
Now we estimate the main term R(n). Observe that we can write R(n) as a 
homogeneous boolean polynomial Y = Y(t), ..., tn) of degree k; more explicitly, 


we have 
Y(t... t) = > IE 


AEA’, jEA 


where A’, is the collection of all sets {x1, . . . , xx} where xı +--+- +x =n and 
n°! < xi <x) < --- < xp. Repeating the computations in Proposition 1.43 we 
see that 


E(Y) = ©,(C logn) 


when n is sufficiently large depending on C, k. To conclude the proof it would thus 
suffice by the Borel—Cantelli lemma to establish the large deviation inequality 


1 1 
P(Wv —E(Y)| > zE) = Oc, (z) 


for all large n. Applying Theorem 1.37 (and choosing C sufficiently large), we see 
that it suffices to show the derivative estimates 


EY), ..., Exa) <n” 


for all large n and some y > 0. In other words, we need to establish 


A Qty ð an 
B((=) (2) CE S07 


42 1 The probabilistic method 


whenever n is large and 1 < a; +---+a@, < k — 1. From the definition of A’, we 
see that we may take a; = 0 for all j < n®™l, and all the other a; equal to 0 or 1, 
since the above partial derivative vanishes otherwise. One can then compute the 
partial derivative and reduce our problem to showing that 


e( 5 p)ar 
AEA :ADAo J€A\Ao 


whenever Ag is any subset of [n° n] of cardinality 1 < |Ao| < k — 1 (this is the 
set of indices where aw; = 1). Applying linearity of expectation and independence, 
and noting that j € [n®!, n] for all j € A\Ao, we conclude that 


Ae Al,:AD Ap jE A\ Ao AEA! :AD Ao 


< Or Oc (n log!/* pjo 


< Ocx(n—'/* log n) 














and the claim follows for large n. 


Remark 1.45 The proof above is from [378] and is based on the proof of The- 
orem 1.48 in [379]. The original proof in [98] was different and did not use 
Theorem 1.37. 


Exercises 


1.8.1 Let A € Z* bea set of n different integers. Prove that A contains a subset 
B of cardinality (og n) with the following property. No two elements 
of B add up to an element of A (thus r2,g(m) vanishes for all m € A, or 
equivalently A N 2B = Ø). 

1.8.2 Prove Lemma 1.41. (Hint: first use the pigeonhole principle to show that 
if |A| > (l — 1k, then either A contains / disjoint sets, or that there exist 
at least | A|/(/ — 1k sets in A which all have a common element x9. Then 
use induction on k.) 


1.9 Thin Waring bases 


Recall that a thin basis of order k is a set B C N such that rz g (n) = O(logn) for 
all large n. Theorem 1.15, proved above, asserts that N contains a thin basis of any 
order. Given the abundance of classical bases such as the squares and primes, it is 
then natural to pose the following question: 


1.9 Thin Waring bases 43 


Question 1.46 Let A be any fixed basis of order k. Does A containa thin subbasis 
B? 


Note that Sidon’s original question can be viewed as the k = 2, A = N case of 
this question. From (1.21) we know that a thin basis B enjoys the bounds 


IB A [0, N]| = Q(N'/*); IB N [0, N]| = O,(N'* log!” N) 


for all large N. Thus we can consider the following weaker version of Question 
1.46: 


Question 1.47 Let A be any fixed basis of order k. Does A contain a subbasis B 
with |B A [0, N]| = O(N "4 log!/* N) for all large N? 


Question 1.47 has been investigated intensively for the Waring bases N^r = 
{0", 1", 2”, ...}, especially when r = 2 [90, 56, 387, 388, 384, 331]. For these 
bases it is known that if k is sufficiently large depending on r , then N“r is a basis 
of order k, and furthermore that 


rk Nar (n) = Oxy (n*-}); (1.42) 


note that this is consistent with (1.21). 

Choi, Erdős and Nathanson proved in [56] that N^2, the set of squares, contains 
a subbasis B of order 4, with |B N [0, N]| = O(N! + e€) for all N > 1 and all 
€ > 0. This was generalized by Zöllner [387, 388], who showed that for any k > 4 
there was a subbasis B C N^2 of order k with |B N [0, N]| = Ok (N 1/4+°) for 
any € > 0 and N > 1. This bound was then sharpened further to |B N [0, N]| = 
O.(N'/* log'/* N); from (1.21) we know that this is sharp except for the loga- 
rithmic factor. A short proof of Wirsing’s result for the case k = 4 was given by 
Spencer in [331]. For r > 3, much less was known. In 1980, Nathanson [259] 
proved that N^r contains a subbasis of some order with density o(N!/"). In the 
same paper, he posed a special case of Question 1.47, when A = N^r. 


In [379], Vu positively answered Question 1.46 (and hence Question 1.47) for 
the case A = N“r for anyr > 1: 


Theorem 1.48 For any fixed r there is an integer ko such that the following holds. 
For any k > ko, the set N“r of all rth powers contains a thin basis B of order k. In 
particular, from (1.21) we have |B A [0, n)| = O,(N'/* log'/* N) for all large N. 


Remark 1.49 The sharp concentration result in Theorem 1.37 was first developed 
in order to prove Theorem 1.48. 


Just as Theorem 1.15 followed from Proposition 1.44, Theorem 1.48 is an 
immediate consequence of 


44 1 The probabilistic method 


Proposition 1.50 Let k,r > 2, and let B be a random subset of (Z*)‘r, defined 
by letting x” € B be independent with probability 


P(x” € B) = min(Cx*! log'/* x, 1) 


for some positive constant C > 1. If k is sufficiently large depending on r, and C 
is sufficiently large depending on k,r , then with probability 1 we have rg g(n) = 
Oc, r, B (logn) for all but finitely many n. In particular, B is a thin basis of order 
k with probability 1. 


Proof (Sketch) As in the proof of Proposition 1.44, it suffices to show that with 
probability 1 we have 
E(n) = Oc,kr,B (1); R(n) = Oc,x,r,8 ogn) 


for all but finitely many n, where R(n) and E(n) were defined in (1.40), (1.41). 
The contribution of E (n) can be dealt with by similar arguments to the previous 
section and is left as an exercise, so we focus on R(n). As before we can write R(n) 


as a boolean polynomial Y, = Y,,(t),..., tm), where m = [n'/k| tą = I(x" € B), 
and 
Y= dT] 
AEAn xEA 
where A, is the collection of sets {x,,..., xg} of positive integers with x} + 


+++ + x; = n and ae xX} < +++ < x;. Given the framework presented in the last 
section, the substantial difficulty remaining is to estimate the expectations of Y, 
and its partial derivatives. In the following, we shall focus on the expectation of 
Y„, establishing in particular that 


E(Y,) = ©x,(C* logn). 


This is the main estimate, and the remainder of the argument proceeds as in Propo- 
sition 1.44. Notice that 


k 
r 
EY,„) = C* ye [ [xf log"*x;; 
Xp <e <x {xX }EAn j=1 
since all the x; range between n!/!" and n'/’, it thus suffices to show that 
r1 r1 
xP xk = Oa). (1.43) 


X1 << {1 eee XK} EAn 


This bound implies, but is a little bit stronger than, the standard bound (1.42), as the 
estimate also asserts some improved bound on the counting function rg ya, (n) when 


1.10 Appendix: the distribution of the primes 45 


one or more of the summands are restricted to be small (so that the corresponding 
weight xz! is large). 

The proof of (1.43) is a standard but lengthy application of the Hardy— 
Littlewood circle method, and is beyond the scope of this book. The reader may 
consult [379] for the full proof. 














Wooley [382] shown that one can set ko = O(r logr ). This is (up to a constant 
factor) also the best current bound for k in (1.42). His proof also relies on Theorem 
1.37, but the number-theoretic part is different. 


Exercise 


1.9.1 In the proof of Proposition 1.50, verify that with probability 1 one has 
E(n) = Oc,k,r,B(1) for all but finitely many n. 


1.10 Appendix: the distribution of the primes 


Several results in this chapter relied on facts concerning the distribution of the 
primes 


P = {2,3,5,...}. 


The distribution of this set is of course a very well-studied subject in analytic num- 
ber theory, with one of the fundamental results being the prime number theorem 


|P O[1,7]]| = +0) (1.44) 


An equivalent formulation is that if pọ denotes the kth prime, then pp = 
(1+ o0(1))klogk. The famous Riemann hypothesis, which is still unsolved, is 
equivalent to the stronger statement that 





IP A[U, n]| = ax + O, (n!) (1.45) 
2 logx 
for any ¢ > 0, or equivalently that p; = k log k + O,(k!/?*®) for any e > 0. 

The prime number theorem is rather deep and will not be proven here. In this 
Appendix we present some related results, most of which have surprisingly elemen- 
tary and beautiful proofs. As they are number-theoretical rather than probabilistic 
in nature we have chosen to place these results in an appendix to this chapter. 

We begin with some classical estimates of Chebyshev and Mertens). As is 
customary, when summing over a variable p, p is understood to denote a prime. 


46 1 The probabilistic method 


Proposition 1.51 (Elementary prime number estimates) Letn > 1 be an inte- 
ger. Then we have the estimates 


X logp = O(n) (1.46) 
psn 

l 
Y 2P = logn + 0(1) (1.47) 
psn 

1 
5 — = loglogn + O(1). (1.48) 
psn 


Remark 1.52 With the prime number theorem, we can improve (1.46) to 
Dees log p = (1+ o(1))n, but it is not necessary to do so for our applications 
here. 


Proof We first prove (1.46). Without loss of generality we may take n to be a 
power of two. Consider the binomial (F: From Pascal’s formula we know that 
e) < 4”. On the other hand, it is clear that every prime between n and 2n will 


divide (?”). Thus 
I] p<”. 
n<p<2n 
Taking logarithms we conclude 
> log p = O(n). 
n<p<2n 


Applying this bound to 1/2, n/4, and so forth, and then summing the geometric 
series, the claim (1.46) follows. 

Now we prove (1.47). This is a similar argument but based around the factorial 
n! instead of e). Observe that the only primes dividing n! are those less than or 
equal to n. For each prime p < n, there are [n/p | numbers (between 1 and n) 
divisible by p, |n/p?] numbers (between 1 and n) divisible by p? and so on. Thus 


i= I] plrlpltlalp + (1.49) 
psn 


Taking the logarithm of both sides and applying Stirling’s formula (Exercise 1.10.1) 
we obtain 


nlogn + O(n) = $ ` (ln/p] + Ln/p?] +--+) log p. 
psn 
Since 


n/p] +la/ P] + = 4+ 00)+0 (4). 
P P 


1.10 Appendix: the distribution of the primes 47 


we conclude, after some rearranging, that 


| | 
Yin? = nlogn + O(n) + J Otlog p) +0 (” = 


psn psn psn 








Since `, gk is convergent, the last term is O(n). The claim now follows from 
(1.46). 

We shall deduce (1.48) from (1.47) using Abel’s summation technique, rewriting 
one partial sum over primes as an average of others. Observe from the fundamental 
theorem of calculus that 


Ly  loøgp 1 








p p logp 


logp [(” dt 
r f It > Pp) 
p 1 t log“ t 


ie? ae del 


pen P pen 


and hence 





Swapping the sum and integral, we obtain 


TnP 1 pat P tlog? t` 


Applying (1.47), we obtain 
Zr he 1+00))— 
— = o p 
pene 1 R t log? t 


T is absolutely convergent, 

















the claim iie 











We now turn to a deeper fact concerning the distribution of primes in intervals. 


Theorem 1.53 For all sufficiently large n, we have |P N [n — x,n)| = OG 
forall n?’ <x <n. 


x 
ogn) 


Results of this type first appeared by Hoheisel [183]; the result as claimed is due 
to Ingham [188]. Note that this theorem follows immediately from the Riemann 
hypothesis (1.45). However, this theorem can be proven without using the Riemann 
hypothesis, rather some weaker (but still very non-trivial) facts on the distribution 
of zeroes of the Riemann zeta function: see [170]. We remark that if one only 
seeks the upper bound on |P N [n — x, n)| then one can use relatively elementary 
sieve theory methods to establish the claim. The constant 2/3 has been lowered 


48 1 The probabilistic method 


(the current record is 7/12, see [187], [178]). However, for the applications here, 
any exponent less than 1 will suffice. 

We now combine this theorem with the Abel summation method to establish 
some further estimates on sums involving primes. 


Proposition 1.54 Let n be a large integer. Then we have the estimates 


1 
— sa) (1.50) 
pePN[Ln—n2/3) T P 
l 
los = P) — doen), (1.51) 
n—p 


peP[l,n—n?/3) 


Proof We begin by proving (1.50). From the fundamental theorem of calculus 


we have 
= 2/3 X 
i pel[n—x,n—n?7/?) x2 


forall p e P O [1,n — n7/3), and hence 
1 


CO 
— -=f |P-A[n x,n n7?)| 
1 


n= 
pe PO[1,n—n?/3) P 








dx 


x2 


The integrand vanishes when x < n?P. When n? < x < 2n?’ , Theorem 1.53 


shows that the integrand is O(-z,—), while for x > n?’ another application of 


n? De 
Theorem 1.53 shows that the integrand is OG) when x < n and Osim 
when x > n. Putting all these estimates together we obtain (1.50). The estimate 
(1.51) then follows immediately from (1.50) since log(n — p) = ©(logn) when 


p €[l,n—n?/]. 














Exercises 


1.10.1 By approximating the sum }~",_, logm by the integral f se log x dx, prove 
Stirling’s formula 


logn! = nlogn — n + O(logn) (1.52) 


foralln > 1. 
1.10.2 Using Proposition 1.51, show that there is a constant c so that there is 
always a prime between n and cn for every positive integer n. 
1.10.3 By being more careful in the proof of (1.46), show that 
Slog p < 2nlog2+ O(n'/) 


p<n 


1.10.4 


1.10.5 


1.10.6 


1.10.7 


1.10.8 


1.10 Appendix: the distribution of the primes 49 


and 


> log p + ps log p > 2n log2 — O(n'/), 

n<p<2n p<2n/3 
and conclude Bertrand’ s postulate, namely that for every sufficiently large 
integer n there exists a prime between n and 2n. (This argument is due 
to Ramanujan. Bertrand’s postulate in fact holds for all integers n, as the 
case of small n can be verified directly.) 
Without using the prime number theorem, prove that |P N [1,7]| = 
e( ioga)? this is known as Chebyshev’ s theorem. This theorem is of course 
superseded by the prime number theorem x (n) = (1 + o(1)) Tsn , but has 
the advantage of having a short elementary proof. 
Prove that p = O(k log k), where p denotes the kth prime. Again, this 
is superseded by the prime number theorem p; = (1 + o(1))k log k. 
Define the von Mangoldt function A : Z* — R by setting A(n) := log p 
ifn > 1 is a power of a prime p, and A(n) = 0 otherwise. Show that 


XCA) = logn (1.53) 


d\n 


for all integers n > 1. Use this to prove that 


(È sw) (È +) _ > ae 


n=1 n=1 n=1 








for all real numbers s > 1. Also, use (1.53) to give an alternative proof 
of (1.49). 
Using the preceding exercise, show that 


[e0] 


lo 1 
> = = +01) 
fa DP s—1 








for all s > 1; integrate this to conclude 
Sai 1 
> — = log + O(1) (1.54) 
Pari s—1l 





for alls > 1. Show that these estimates can also be deduced from Propo- 
sition 1.51 via Abel’s method. Conversely, use (1.54) and (1.46) to give 
an alternative proof of (1.48). 

Using Abel’s summation method, show that the prime number theo- 
rem z(x) = (1+ 0(1)) ies is equivalent to the estimate ares A(n) = 
(1+ 0(1))x. 





50 1 The probabilistic method 


1.10.9 By being more careful in the proof of (1.48), show that 
1 1 
yi - = loglogn +C + 0 
San P logn 
for some absolute constant C. Use this to deduce Merten’s theorem 
1 C 
Į[(1--]=0 +00) (1.55) 
eh p logn 


for some other absolute constant C’ and all n > 1. (In fact one has C’ = 
e`”, where y = 0.577... is Euler’s constant.) 








2 





Sum set estimates 


Many classical problems in additive number theory revolve around the study of 
sum sets for specific sets A, B (though one typically works with infinite sets rather 
than finite ones). For instance, if N“2 := {0, 1, 4, 9, 16, ...} is the set of square 
numbers, then it is a famous theorem of Lagrange that 4N“2 = N, i.e. every natural 
number is the sum of four squares; if P := {2, 3, 5,7, 11, ...} is the set of prime 
numbers, then it is a famous theorem of Vinogradov that (2 - N + 1)\3P is finite 
(i.e. every sufficiently large odd number is the sum of three primes); in fact it is 
conjectured that this exceptional set consists only of 1, 3, and 5. The corresponding 
result for (2 - N)\2P remains open; the infamous Goldbach conjecture asserts that 
2P contains every even integer greater than 2, but this conjecture remains far from 
resolution. 

In this text, we shall not focus on these types of problems, which rely heavily on 
the specific number-theoretic structure of the sets involved. Instead, we shall focus 
instead on the analysis of sum sets A + B and related objects for more general 
sets A, B. To simplify the discussion we shall focus primarily on additive sets 
A, B, which are finite and non-empty subsets of an additive group such as Z; thus 
our theory will not cover infinite sets such as the squares N^2 or the primes P 
directly, although one can certainly use this theory to analyze those sets simply by 
considering finite truncations, say to an interval [0, N]. 

A fundamental problem in this field is the inverse sum set problem: if A + B or 
A — B is small, what can one say about A and B? A more specific question is as 
follows: if A is a finite non-empty subset of integers such that |A + A| = K |A] for 
some small number K , what can one say about A? Here and in the rest of the text we 
use |A| to denote the cardinality of a finite set A. The number K := |A + A|/|A| is 
referred to as the doubling constant of A and will be denoted in this text by o [A]. It 
is easy to see that this constant is at least 1, but it can be much larger; for instance, 
if A is a geometric progression such as A = 2^[0, N) = {1, 2, 22, ..., 2071} 


51 


52 2 Sum set estimates 


then one can easily verify that o[A] = (N + 1)/2, so the doubling constant 
can be arbitrarily large; indeed for “generic” sparse sets A we will have o[A] = 
(IAI + 1/2. 

At the other extreme, if A is an arithmetic progression A := a + [0, N) -r = 
{a,a+r,a+2r,...,a+(N — 1)r} of length N then one can check that A has 
doubling constant o[A] = 2 — x Thus arithmetic progressions are examples of 
sets with small doubling constant. One can perturb this example to produce a 
number of other examples of sets with small doubling constant; for instance if A is 
the above arithmetic progression, and we let A’ be a subset of A of cardinality N /2 
(say), then one can easily check that A’ has doubling constant at most 4. Another 
example comes from adding an arbitrary integer n to A; then the set A U {n} also 
has doubling constant at most 4. 

One can generalize the concept of an arithmetic progression, to create more 
sets with small doubling constant. Consider the set 


A :=a + [0, (M1, N2))- (U1, v2) = {a + nivi +N2v2 : O < ni < N1;0 < m< Na}, 


where a, v1, v2 are integers, N1, N2 are positive integers and n4, m2 are understood 
to lie in the integers; this is an example of a generalized arithmetic progression of 
rank 2. One can verify that such sets have a doubling constant of at most 4. Note 
that such sets can look quite different from an ordinary arithmetic progression if 
N1, No are large and vj, v2 are very widely separated. 

We have just remarked that generalized arithmetic progressions have small 
doubling constant. One of the fundamental theorems in this subject is Freiman’s 
theorem, which asserts a partial converse to this claim. Freiman’s theorem shows 
that any finite subset of the integers with small doubling constant can be efficiently 
contained in a generalized arithmetic progression (of bounded rank). This theorem 
is very useful, but is rather deep, and we will defer its proof to Section 5.4. It also 
has the drawback that some of the constants in this theorem depend exponentially 
on the doubling constant o [A]. As such, it tends to only be useful in contexts where 
the doubling constant o [A] is of the order of log |A| or smaller. 

Roughly speaking, one can classify results in inverse sum set theory by the range 
of o [A] for which the results are non-trivial. The case øo [A] = 1 is group theory 
(see Proposition 2.7). When o[A] is very small, e.g. o[A] < 2 or o[A] < 3, we 
have a complete characterization of the inverse problem, characterizing A in terms 
of groups and arithmetic progressions (see Corollary 5.6, Theorem 5.11). When 
o[A] = O(log |A|), the best result is Freiman’s theorem, which characterizes A 
in terms of generalized arithmetic progressions. When o [A] = O(|A|*) for some 
small £, we have Proposition 2.26 (as well as many of the other results in this 
chapter), which characterizes A in terms of approximate groups. In the remaining 


2 Sum set estimates 53 


cases |A|? < o[A] < |A|, some of the estimates here are still useful, but our 
understanding is still quite poor. 

We will not prove Freiman’s theorem in this chapter. However, we will develop 
the more elementary theory of sum set estimates, which can be used as sub- 
stitutes for Freiman’s theorem in some cases and are also of interest in their 
own right; this theory will also be needed in the proof of Freiman’s theorem 
later on. These estimates are obtained by very simple combinatorial considera- 
tions, and rely on simple arithmetic facts such as a — c = (a — b) + (b — c) and 
a+b=a' +b <=> a-—b'=da' —b. Because of the simplicity of the tech- 
niques used here, the results in this section are quite general, being applica- 
ble to any additive group and even to a large extent to non-abelian groups (see 
Section 2.7); we will wait until Chapter 5 until developing sum set estimates which 
exploit the specific structure of the ambient group (though see also Section 3.4). 
Also, the bounds obtained here are fairly reasonable, for instance the dependence 
of constants on the doubling constant o[A] is only polynomial in all the results 
in this section (in contrast to the exponential dependence on o [A] in Freiman’s 
theorem). In some cases, though, the results in this section will be superseded by 
more precise results proven using advanced techniques, which we will address in 
later sections; for instance, in Section 6.5 we shall develop the theory of Pliinnecke 
inequalities, which give more precise control on iterated sum sets and also han- 
dle the case when A and B have very different sizes, a case which is not treated 
efficiently by the tools in this section. 

There are a large number of results in this chapter, but we point out a couple 
of specific results proven here which have a very large number of applications. 
The first is Ruzsa’s triangle inequality, Lemma 2.6, which allows us to define a 
“metric” on the space of additive sets and which measures how small their sum 
sets are. Then there is Corollary 2.12, which links the size of |A + B| and |A — B| 
for arbitrary additive sets A, B. This generalizes to the iterated sum set estimates 
in Corollary 2.23 and Corollary 2.24. Another very useful class of tools are the 
covering lemmas — Ruzsa’s covering lemma (Lemma 2.14), Green—Ruzsa’s cov- 
ering lemma (Lemma 2.17), and Chang’s covering lemma (Lemma 5.31), which 
gives conditions under which one set A can be efficiently covered by translates 
of another set B. These results are collected together in Proposition 2.26 and 
Proposition 2.27, which characterize sets with small sum set in terms of approx- 
imate groups. Last, but certainly not least, there is the Balog—Szemerédi-Gowers 
theorem, which generalizes the previous results to the setting when one has only 
partial information on a sum set (or equivalently, one only controls the “additive 
energy” between two sets); see Theorem 2.29 and Theorem 2.31. We also develop 
an asymmetric version of this theorem in Section 2.6. 


54 2 Sum set estimates 


2.1 Sum sets 


We now systematically study the sum sets A + B and difference sets A — B of 
two additive sets A, B in an ambient group Z as defined in Definition 0.1, as well 
as the iterated sum sets nA. We should caution the reader that the iterated sum 
set nA is in general not the same as the dilate n - A := {n -a : a € A} though we 
do have the inclusion n - A C nA. Similarly the difference set A — B should not 
be confused with the set-theoretic difference A\B := {x € A: x ¢ B}. We also 
write A + x = A + {x} for the translate of A by an element x € Z. 

Since addition of group elements is associative and commutative, one can easily 
verify the same is true for addition of sets. We should caution however that the sum 
set operation is not invertible: for instance, A + B — B contains A but is generally 
not equal to A. Similarly, when n > m, then nA — mA will contain (n — m)A but 
will generally be larger. 

A very fundamental question in this topic is the following: under what conditions 
is A + B “small”, and under what conditions is it “large”? More precisely, we will 
be interested in the cardinality |A + B | ofthe sum set A + B. We have the following 
trivial estimates: 


Lemma 2.1 (Trivial sum set estimates) Let A, B be additive sets with common 





ambient group Z, and let x € Z. Then we have the identities |A + x| = | — A| = 
|A|, the inequalities 
max(|A|, |B|) < |A + B|, |A — B| < |A||B| (2.1) 
and the inequalities 
A|({A| +1 
ey eed 22) 
More generally, for any integer n > 1, we have |(n + 1)A| > |nA| and 
Aļ+n-1 A|(|A] + 1)---Q(Al+n—-1 
il < (| | )=! KAI+ D at | ) (2.3) 
n n! 


We remark that the lower bound in (2.1) can be improved for specific groups 
Z, or when A and B have large “dimension”; see Theorem 3.16, Lemma 5.3, 
Theorem 5.17, Corollary 5.13, Theorem 5.4. 


Proof We shall just prove (2.3), as all the other inequalities either follow from 
this inequality or are trivial. We argue by induction on |A]. If |A| = 1 then both 
sides of (2.3) are equal to 1. If |A| > 1, then we can write A = B U {x} where B 
is a non-empty set with |B| = |A| — 1. Then 


nA=|JUB+a-/-x) 


n 


j=0 


2.1 Sum sets 55 
and hence by the induction hypothesis and Pascal’s triangle identity 


ahs "\(/|Al-1+j-1 |AJ+n-1 
nals Dieis > ( = z 


j=0 j=0 J 

















as claimed. (We adopt the convention that 0B = {0}.) 


Observe from the above facts that the magnitude of sum sets such as A + B, 
A — B, kA are unaffected if one translates A or B by an arbitrary amount. This 
gives much of the theory of sum sets a “translation-invariant” or “affine” flavor. 
We will sometimes take advantage of this translation invariance to normalize one 
of the sets, for instance to contain the origin 0. 

For “generic” additive sets A and B, the cardinalities of the sum sets considered 
in Lemma 2.1 are much more likely to be closer to the upper bounds listed above 
than the lower bounds; see for instance Exercise 2.1.1. This suggests that the lower 
bounds are only attainable, or close to being attainable, when the sets A and B have 
a considerable amount of structure; we shall develop this theme in the remainder 
of this chapter, by introducing tools such as doubling and difference constants, 
Ruzsa distance, additive energy, and K -approximate groups to quantify some of 
these notions of “structure”. For now, we at least settle the question of when the 
lower bound in (2.1) is attained. 


Proposition 2.2 (Exact inverse sum set theorem) Suppose that A, B are addi- 
tive sets with common ambient group Z. Then the following are equivalent: 


e |A+ Bl =|Al; 

e |A- B| = |A]; 

e |A +nB — mB| = |A| for at least one pair of integers (n, m) 4 (0, 0); 

e |A +nB — mB| = |A] for all integers n,m; 

e there exists a finite subgroup G of Z such that B is contained in a coset of G, 
and A is a union of cosets of G. 


Proof We shall just show that the first claim implies the fifth; the remaining 
claims are either similar or easy and are left to the exercises. By translating B 
if necessary we may assume that B contains 0. Then A + B D {0} + A = A, but 
since |A + B| = |A| we have A + B = A. In particular A + b = A forall b € B. 
Thus if we define the symmetry group Sym; (A) (also known as the period of 
A) to be the set Sym; (A) := {h € Z : A + h = A}, then we have B C Sym, (A). 
We leave as an exercise for the reader the verification that Sym; (A) is a finite 
group, and A is the union of cosets of Sym; (A); the claim then follows by setting 
G := Sym,(A). 














56 2 Sum set estimates 


We shall study the symmetry group Sym,(A), as well as the more general 
symmetry sets Sym,(A), more systematically in Section 2.6. 

As to when the upper bound is attained, we do not have as explicit a description, 
but we can give a number of equivalent formulations of the condition. 


Proposition 2.3 Suppose that A, B are additive sets with common ambient group 
Z. Then the following are equivalent: 


° |A + B| = |A||B]; 

e |A — B| = |A||B]; 

° |{(a,a',b,b) € AXAXBXB:a+b=a'+b}|=[AllBI; 
° |{(a,a',b,b) € Ax AX Bx B:a—b=a'—b}| =|Al|BI; 
e |AN(x — B)| = 1 forallx € A+B; 
°|AN(B+y)| = 1 forall y € A — B; 

e (A — A)N (B — B) = {0}. 





We leave the easy proof of this proposition to the exercises. For a partial gen- 
eralization of it, see Corollary 2.10 below. 

In Proposition 2.2 and Proposition 2.3, the sets A + B and A — B have the 
same size (see also Exercise 2.1.6). However, this is not true in general. A basic 
example is the set A = {0, 1, 3} C Z; then A + A = {0, 1, 2, 3, 4, 6} has six ele- 
ments and A — A = {—3, —2, —1, 0, 1, 2, 3} has seven elements. More generally, 
if A = {0, 1, 3}¢ C Zf, then A + A has 6f elements and A — A has 7°. Thus A — A 
can be larger than A + A by an arbitrarily large amount. In the converse direction, 
the set A := {(0, 0), (1, 0), (2, 0), G, 1), (4, 0), (5, 1), (6, 1), (7, 0), (8, 1), (9, D} 
€ Zio X Zo is such that A + A = Zio x Z2 has 20 elements, but A — A = Zio x 
Z2\{(0, 1)} has only 19 elements; one can amplify this example as before by raising 
to the power d. Despite these examples, however, there are still several relation- 
ships between the size of |A + A| and |A — A|; see in particular (2.11) below. 





Exercises 


2.1.1 Let N, M > 1 be integers, and let A and B be sets of cardinality N and 
M respectively chosen uniformly at random from the real interval {x € 
R:0< x < 1}. Show that with probability 1 we have |A + B| = |A||B| 
and |nA| = ('44"~") for all n > 1. 

2.1.2 Prove the remaining claims in Proposition 2.2. 

2.1.3 Let A be an additive set. Show that A is a group if and only if 2A = A. 

2.1.4 Prove Proposition 2.3. 

2.1.5 [289] Find an additive set A of integers such that |A — A| < |A + Al. 
(Hint: there are several ways to proceed. One way is to tile the lattice Z? 
with the Zio x Z2 example given above, and somehow truncate and then 
project this back to Z.) 


2.2 Doubling constants 57 


2.1.6 LetA, B be additive sets in a finite additive group Z, such that |A| + |B| > 
|Z|. Prove that A+ B = A — B = Z. Give an example to show that the 
condition |A| + |B| > |Z| cannot be improved. 

2.1.7 Show that for any additive set A, the symmetry group Sym,(A) of A 
as defined in the proof of Proposition 2.2 is a finite group contained in 
A — A, obeys the identity A = A + Sym, (A), and that A is a union of 
cosets of Sym,(A). (We shall define a more general notion of symmetry 
sets Sym,(A) of an additive set in Section 2.6.) 

2.1.8 Let d > 1. Give an example of an additive set A of integers such that 
|A + A| = 67 and |A — A| = 7%. (see also Lemma 5.25.) 


2.2 Doubling constants 


The traditional way to measure the additive structure inside an additive set A is 
via doubling constants o [A], which we now define. We will shortly develop two 
other measures of additive structure, namely the additive energy E(A, A), and the 
concept of a K -approximate group, which are also useful, and are closely related 
to the doubling constant. 


Definition 2.4 (Doubling constant) For an additive set A, the doubling constant 
o[A] is defined to be the quantity 





o[A] := eal = ETA, 
|A] |A] 
Similarly we define the difference constant 5[A] as 
|A — Al 
|A] 





S[A] := 


From (2.2) we thus have the bounds 


1< jije aes | l Jissie] | t oe 
Oo an + ; 
= = 2 = = 5 | | 





The upper bound here is quite easy to attain; for instance if A = 2^[0, N) = 
{1,2,2?,..., 20-1} C Z, then |A| =N, |A+ A] = “S*®, and |A- A| = 
aw) + 1, hence o[A] = NH and ô[A] = aot + x In the converse direction, 
Proposition 2.2 shows that o [A] = 1 (or d[A] = 1) if and only if A is a coset of a 
group; we shall elaborate upon this in Proposition 2.7 below. 

An additive set A with the maximal value of doubling constant o[A] = 
(|A| + 1)/2 (or equivalently, with maximal difference constant 6[A] = st + 
ap is known as a Sidon set or a B2 set. Informally, this means that all the pairwise 
sums of A are distinct, excluding the trivial equalities coming from the identity 
a+b =b +a; see Exercise 2.2.1. We will revisit Sidon sets in Section 4.5. 


58 2 Sum set estimates 


There are various senses in which this behavior is “generic”; for instance, if A 
is a set of N real numbers chosen uniformly at random from the unit interval 
{x € R:0 <x < 1}, then we see from Exercise 2.1.1 that A is a Sidon set with 
probability 1, and so |A + A| = ““*; the point is that if {a, b} # {c, d} then 
a +b and c +d will “generically” be distinct. A more interesting question is to 
understand the conditions under which the doubling constant o [A] (or difference 
constant 6[A]) can be small. 

As mentioned earlier, o [A] = 1 if and only if A is the coset of a finite subgroup 
G of Z. We thus expect that if A has a doubling constant which is small, but not 
actually equal to 1, then it should behave “approximately” like a group (up to 
translations); we shall see several manifestations of this heuristic throughout this 
book, when we develop more tools with which to analyze the doubling constant. 
Indeed, the study of sets of small doubling constant can be thought of as a kind of 
“approximate group theory”, with the inverse sum set theorems of Chapter 5 then 
being analogous to a classification theorem for groups. 

The study of sets with close to maximal doubling appears to be hopeless at 
present. A probabilistic construction of Ruzsa [291] shows that there exist large 
additive sets A with |A — A| very close to the maximal value of |A|*, but |A + A| < 
|A|?~¢ for some explicit absolute constant c > 0; and similarly with the roles of 
A — A and A + A reversed. 


Exercises 


2.2.1 Let A be an additive set. Show that A is a Sidon set if and only if, for any 
a,b,c,d € A, wehavea+bA#c+d unless {a, b} = {c, d}. 

2.2.2 Let Z be an additive group, let a,r € Z, and let N > 1 be an integer. 
Let P = {a,a+r,...,a+(N — 1r} be an arithmetic progression in Z. 
Show that o[P] < 2 — $, with equality if and only if ord(r) > 2N — 1, 
where ord(r ) is the order of the group element r in Z. 

2.2.3 If 6:Z'— Z is a surjective group homomorphism whose kernel 
ker(#) := @~'({0}) is finite, and A is an additive set in Z, show that 
o[p'(A)] = ofA]. 

2.2.4 If A, A’ are additive sets in Z, Z’ respectively, show that o[A x A’] = 
o[A]o[A’]. In particular o[A®™] = o [A] for all d > 1. 

2.2.5 Let A be any additive set. Show that a non-empty subset of A can have 
doubling constant at most ./o[A][A|/2. Give examples that show that 
this bound cannot be improved except by an absolute constant. What is 
the analogous statement for the difference constant? 

2.2.6 [100] Let A be any additive set. Show that a Sidon set contained in A 
can have cardinality at most ./2o[A]|A]. (Thus sets with small doubling 


2.2.7 


2.2.8 


2.2.9 


2.2.10 


2.2.11 


2.3 Ruzsa distance and additive energy 59 


constant cannot contain very large Sidon sets.) What is the analogous 
statement for the difference constant? 

[294] Let p be a prime, let 0 € Z,\0 be a multiplicative generator 
of Zp, and let Z := Zp-1 x Zp. Let A C Z be the set A := {(t, 6’): 
t=1,...,p— 1}. Show that A is a Sidon set, and compare this to 
Exercise 2.2.6. Modify this construction to give an example of a Sidon 
set A C [0, N] for a large integer N such that |A| is comparable to 
N'/?. A similar example can be given by using the discrete parabola 
{(t, 07) :te€ Z,}inZ, x Zp. Fora survey of other constructions of Sidon 
sets, see [264]. 

Let N be a large integer. Give examples of finite non-empty sets A, B of 
integers such that |A| = |B| = N and o[A], o[B] < 2, but o[A U B] > 
X, This example shows that doubling constants can behave very badly 
under set union (see however Exercise 2.3.17). On the other hand, estab- 
lish the inequality o [A U B] < o [A] + |B|; thus adding a small set to A 
will not significantly affect the doubling constant. 

Let N be a large integer. Give examples of finite non-empty sets A, B of 
integers such that |A| = |B| = N and o [A], o [B] < 10, buto[AN B] > 
aN 1/2. (Hint: concatenate a Sidon set with an arithmetic progression.) 
Compare this result against Exercise 2.2.6. This example shows that 
doubling constants can behave badly under set intersection (but see 
Exercise 2.4.7). 

Let A be an additive set in Z, and let m : Z — Z’ be a group homo- 
morphism. Show by example that o[zr(A)] is not necessarily less than 
or equal to o [A]. (Hint: this is surprisingly delicate. One way is to start 
with an additive set C in some additive group Zo with o [C] > d6[C], and 
consider the additive set A := ((—C)” x {0} x G) U(C” x X x {0} in 
Zi x Z x G, where n > 1 is large, G is a very large finite group, and X 
is a Sidon set of medium size in a group Z.) See however Exercise 2.3.8 
and Exercise 6.5.17. 

Let A be an additive set in Z, and let G be a finite subgroup of Z. Show 
by example that o [A + G] is not necessarily less than or equal to o [A]. 
(Hint: use the previous exercise.) 


2.3 Ruzsa distance and additive energy 


The doubling constant measures the amount of internal additive structure of a 
single additive set A. We now introduce two useful quantities measuring the amount 


60 2 Sum set estimates 


of common additive structure between two additive sets A, B — the Ruzsa distance 
and the additive energy. 


Definition 2.5 (Ruzsa distance) Let A and B be two additive sets with a common 
ambient group Z. We define the Ruzsa distance d(A, B) between these two sets 
to be the quantity 
IAB) Sigg Se 
( ’ ):= og |A|!/2| BI 1/2" 


Thus for instance d(A, A) = log ô[A]. 
We now justify the terminology “Ruzsa distance”. 


Lemma 2.6 (Ruzsa triangle inequality) /297] The Ruzsa distance d(A, B) is 
non-negative, symmetric, and obeys the triangle inequality 


d(A, C) < d(A, B) + d(B,C) 
for all additive sets A, B, C with common ambient group Z. 


Proof The non-negativity follows from (2.1). The symmetry follows since B — 
A = —(A — B). Now we prove the triangle inequality, which we can rewrite as 
|A — B||B — C| 


|A-—C|s 
|B| 





From the identity 
a-—c=(a—b)+(b-c) 


we see that every element a — c in A — C has at least |B | distinct representations 
of the form x + y with (x, y) € (A — B) x (B — C). The claim then follows. 














For an approximate version of this inequality in which one replaces complete 
difference sets with nearly complete difference sets (using at least 75% of the 
differences), see Exercise 2.5.4. 

The Ruzsa distance thus satisfies all the axioms of a metric except one; we do not 
have that d(A, A) = 0 for all sets A (also, we have d(G + x, G + y) = 0 whenever 
G + x, G + y are cosets of a group G). Indeed we have a precise characterization 
on when this Ruzsa distance vanishes: 


Proposition 2.7 Suppose that (A, Z) is an additive set. Then the following are 
equivalent: 


e ofA] = 1 (i.e. |A + A] = |Al); 


e [A] = 1 (i.e. |A — A| = |A], or d(A, A) = 0); 
e d(A, B) = 0 for at least one additive set B; 


2.3 Ruzsa distance and additive energy 61 


e jnA — mA| = |A| for at least one pair of non-negative integers n, m with 
n+m>2; 
e |nA —mA| = |A] for all non-negative integers n, m; 


e A isa coset of a finite subgroup G of Z. 














Proof Apply Proposition 2.2 and the Ruzsa triangle inequality. 


Later on in this chapter we shall generalize this proposition to the case when 
the Ruzsa distance, difference constant, or doubling constant are a little larger than 
0, 0, or 1 respectively, but still fairly small; see Proposition 2.26. 

Despite the non-vanishing of the distance d(A, A) in general, it is still a useful 
heuristic to view the Ruzsa distance as behaving like a metric!. Now we relate the 
difference constant to the doubling constant. From the definition of Ruzsa distance 
and doubling constant we have the identity 


d(A, —A) = logo[A]. (2.4) 
In particular, from Lemma 2.6 we have 
log 6[A] = d(A, A) < 2logo[A] 


and hence we obtain the estimate 


s[A] < of AP (2.5) 
or in other words that |A — A| < Boe A similar argument gives the more general 
estimate 

|A +B}? 
|B — B| < —— (2.6) 
|A] 


for any two additive sets A, B with common ambient group Z. 

It turns out that we can conversely bound the doubling constant of a set by its 
difference constant; see (2.11) below. 

Having introduced the Ruzsa distance, we now turn to the closely related notion 
of additive energy E(A, B) between two additive sets. 


Definition 2.8 (Additive energy) If A and B are two additive sets with ambient 
group Z, we define the additive energy E (A, B) between A and B to be the quantity 


E(A, B) := |{(a,ad',b,b)EAxAxBxB:a+b=a +b}. 


1 One could artificially convert the Ruzsa distance into a genuine metric by identifying A with A + x 
for all x, and redefining d (A, A) to be zero, or alternatively by introducing the metric space 
X := {Ax {j}: A C Z;0 < |A| < œ; j € {1, 2}} — consisting of two copies of each finite 
non-empty subset of Z (again identifying A with its translations) — with the metric 
dx(A x {j}, B x {k}) defined to equal d(A, B) if A x {j} # B x {k} and equal to 0 otherwise. 
However there appears to be no significant advantage in working in such an artificial setting. 


62 2 Sum set estimates 


We observe the trivial bounds 
|A||B| < E(A, B) < |A||B| min(|A||B |). (2.7) 


The lower bound follows since a + b = a' + b’ whenever (a, b) = (a', b’). To 
see the upper bound, observe that if one fixes a,a', b, then b’ = a + a' — b is 
completely determined, and hence E(A, B) < |A|?|B|. A similar argument gives 
E(A, B) < |A||B|?. Note that Proposition 2.3 addresses the case when E(A, B) = 
|A||B]. 

We will analyze the additive energy more comprehensively in Section 4.2, 
when we have developed the machinery of Fourier transforms, and in Section 2.5, 
when we have developed the Balog—Szemerédi—Gowers theorem. For now we 
concentrate on the elementary properties of this energy. We first observe the 
symmetry property E(A, B) = E(B, A) and the translation invariance property 
E(A+x,B+y)= E(A, B) forall x, y € Z. From the trivial observation 


atb=a4+) 4 a-b =a —b 


we also see that E(A, B) = E(A, —B), and similarly if we reflect A to — A. 
The additive energy reflects the extent to which A intersects with translates of 
B or —B, as the following simple identities show: 


Lemma 2.9 Let A, B be additive sets with ambient group Z. Then we have the 
identities 


IAIBI= >> |AN@-B)= Do ANB +y) 


xE€A+B yeA-B 
and 
E(A,B)= > |AN@- B)? 
xE€A+B 
= D> JAN(B+y)/ 
yeA-B 
= }, AnG+A)|BNG +8). 
ze(A—A)N(B—B) 
In particular, if we let r 44.3(n) denote the number of representations of n asa + b 
for some a € A and b € B, and define r4_ p(n) similarly, then we have 


|AIIB] = So rasg (n) = X ` ra-g(n); E(A, B) = Ñ rass} = Yo ra- 


n 


Proof A simple counting argument yields 


|A||B| = 5 I{(a,b)e AX Biatb=x}/= » |A N (x — B)|; 


xE€A+B xe€A+B 


2.3 Ruzsa distance and additive energy 63 


By replacing B with —B we similarly obtain |A||B| = aaa: IAN(B+y)|. 
This gives the first set of identities. For the second set we compute 


>) Ana- B)? 


xEA+B 


> I{(a,b)€ AX B:a+b=x}/? 


xE€A+B 

XO a.a',.b, be AX AXBxB:atb=a'+b' =x} 
xE€A+B 

=|{(a,a’,b,b).€ AX AXBxB:atb=a'+b}| 

I{(a,a',b,b.)e AX AX Bx B:a—b' =d —b}| 

5 I{(a,b) € AxB :a-—b' =a' — b}? 

ycA—B 


X IANB+y)? 


yeA—-B 





and 


IAN(z+ AIIB NAC +B) 
ze(A—A)N(B—B) 
= I((a,a',b,b)€ Ax AXBxB:z=a-—a'=b' —b}| 
ze(A—A)N(B—B) 
= |{(a,a',b,b')€ A x AX Bx B:a—ad' = b' — b}| 


= |{(a,ad',b,b’)E AXAXBxB:at+b=a' +b'}| 


and the claims follow from the definition of E(A, B). The last identity follows 
since ra+g (n) = |A N (n — B)| and ra-g(n) = |AN (B + n)|. 














As a consequence of this Lemma we have the following inequalities, which 
assert that pairs of sets with small Ruzsa distance have large additive energy, 
and pairs with large additive energy have large intersection (after translating and 
possibly reflecting one of the sets). 


Corollary 2.10 Let A, B be additive sets. Then there exists x € A + B and y € 
A — B such that 
E(A, B) _ |AIIB] 


[AN (x — B)|, |AN (8 + y)| > > (2.8) 
|A||B] ~ [AFB] 








for either choice of sign +. In particular all of the above quantities are bounded 
by |(A — A) N (B — B)|. Finally we have the Cauchy—Schwarz inequality 


E(A, B) < E(A, A)'7E(B, B). (2.9) 


64 2 Sum set estimates 


Proof From Lemma 2.9 and Cauchy—Schwarz we have 
E(A, B) | _JAIIB] 
|A||B] ~ |A+B| 


Also, from the last part of Lemma 2.9 we have 








E(A, B) < |A||B| max ra+s(x), |Al|B] max r4_a(y) 
xE€A+B yeA-B 


which establishes (2.8). To bound |A N (x — B)| and |A N (B + y)|, observe that if 
z € AN (x — B),then A N (x — B) C z+ ((A — A)N(B — B)), hence |A N (x — 
B)| < (A — A)N (B — B)|, and similarly |A NA (B +y)| < (A — A)N (B — 
B)|. Finally, (2.9) follows from the formula F(A, B) = Ž e(a- ANB-B) |AN 
(z + A)||B N (z + B)| from Lemma 2.9 and the Cauchy—Schwarz inequality. 











Another connection in a similar spirit is 





Lemma 2.11 Let A, B be additive sets. Then for any x € A+ B we have |AN 


|A-B/? 
(x — B)| < Moar 





Proof (Lev Vsevolod, private communication) We can rewrite the inequality as 
[{(a,b,c)€ AX Bx (A+ B):a+b=x}| < |(A-—B)x(A-B)|. 


Now for each (a, b, c) in the set on the left-hand side, we can write c= a, + be 
for some a, € A, be € B, and then form the pair (a — bc, ac — b) € (A — B) x 
(A — B). Using the identity c = x — (a — be) + (ac — b) we can verify that this 
map is injective. The claim follows. 

















Corollary 2.12 Let A, B be additive sets with ambient group Z. Then there exists 
x € A+B such that 


JA- BP _ |A- BPIAIIB| _ |A- BP 
IAN(@@—B)|~  E(A,B) AT IBI ` 


Furthermore we have 





(2.10) 


d(A, —B) < 3d(A, B). 


Proof The inequalities in (2.10) follow from (2.8), and the final inequality 
d(A, —B) < 3d(A, B) then follows from Lemma 2.11 and the definition of Ruzsa 
distance. 














From (2.10) and and (2.5) we obtain the inequalities 
5[A]'* < of A] < ô[A]? (2.11) 


which were first observed in [289]. Thus an additive set has small doubling constant 
if and only if its difference constant is small. It is not known whether the lower 


2.3 Ruzsa distance and additive energy 65 


bound is best possible. However, the upper bound can be improved toa [A] < 6[A}* 
using Pliinnecke inequalities; see Exercise 6.5.15. 

We now show how the Ruzsa distance can be used to control iterated sum sets. 
We begin with a lemma which controls iterated sum sets of “most” of A + B. 


Lemma 2.13 Let A and B be additive sets in a common ambient group. Then 
there exists SC A + B such that 


l{(a,b)€ Ax B:a+be$}| = |Al|Bl/2 (2.12) 
and such that 
|A+B+ a ee (2.13) 
n$| < ——___— $ 
|A|"|B|" 


for all integers n > 0. 


Note that (2.12) gives a lower bound on |S|, namely 


|S] = max(|A], |B])/2. (2.14) 
Proof If we define S to be the set of all x € A + B such that 
Ka, Bye Ax Biatb=xy> Al 
T 2|A + B| 
then we have 
{(a,b)e Ax B:a+be(A+B)\S}| < ja + pala 
2|A + B| 


which gives (2.12). 
Now we prove (2.13). A typical element of A + B + nS can be written as 


ao + S1 +82 +-+ + Sn + bni 





where ao € A, bn41 E B, and s1,...,5, € S. By definition of S, we can expand 
this in at least AR )” different ways as 


ao + (bı +1) + (b2 + 2) + +++ + (bn + an) + Ont 


where b; € B, a; € A, and b; + a; = s; for all 1 <i < n. We regroup this as the 
sum of n + 1 elements from A + B, 


(do + by) + (ay + b2) + +++ + (an + bn 41) 





and observe that for fixed do, 51,..-,5n,bn41, the quantities aọ + bi, ai + 
b2, ..., an + bn+1 completely determine all the variables do, . . . , an, bi, .--, On4i- 
Thus we have shown that every element of A+ B+ nS has at least ge 


representations of the form to +--- + tn where each t; € A + B. The claim then 
follows. 














66 


2 Sum set estimates 


This result can then be used, together with the Ruzsa triangle inequality, to 
deduce control on iterated sum sets of A and B; see Exercise 2.3.10. However we 
will pursue an approach that gives slightly better bounds in the next section (and 
an even better result will be developed in Section 6.5). 


Exercises 


2.3.1 


If ¢: Z’ > Z is a surjective group homomorphism whose kernel 
ker(¢) := @~'({0}) is finite, and A, B are additive sets in Z, show 
that d(@~!(A), @~|(B)) = d(A, B). Also show that d(A + x, B + y) = 
d(A, B) for any x,y E€ Z. 

If A, B,C, D are additive sets in Z, show that 


d(A, B) — los CDI <d(A+C,B+D)<d(A, B)+log|C — D| 
and 
d(A, BUC) < max(d(A, B), d(A,C)) + l log2. 
If A’, B’ are additive sets in Z’, show that 
d(A x A’, B x B')=d(A, B)+d(A’‘, B’). 


Let A,B be additive sets with common ambient group. Show 
that d(A, B) < į log|A| + 4log|B|, and that d(A, B) = 4} log |A| + 
5 log |B| if and only if d(A, —B) = 3 log |A| + 5 log |B]. 
Let A, B, C be additive sets in Z. Show that 

d(A, C) <d(A, B) + 5 log E (2.15) 
whenever C C B; this shows that the Ruzsa distance d(A, B) is stable 
under refinement of one or both of the sets A, B. By combining this 
inequality with the triangle inequality d(A, —B) < d(A, (x — A)N B) + 
d((x — A) N B, —B), give another proof of Lemma 2.11. 
Show that for any n > 1, there exists an additive set A such that |A| = 4”, 
|A + A| = 10”, and |2A — A| = 28”. Thus it is not possible to obtain an 
estimate of the form |2A — A| = O(o7[A]|A]). 
Let A,B be additive sets with common ambient group. Show that 
e~24(A,B)| A| < |B| < e744-5)| A|. Thus sets which are close in the Ruzsa 
distance are necessarily close in cardinality also. Of course the converse 
is far from true. 
Let A, B be additive sets with common ambient group Z. Show that 
d(A, B) = 0 if and only if A, B are cosets of the same finite subgroup G 
of Z. (We shall generalize this result later; see Proposition 2.27.) 


2.3.10 


2.3.11 


2.3.12 


2.3 Ruzsa distance and additive energy 67 


Let A be an additive set in an additive group Z, and let G be a finite 


subgroup of Z. Show that o[A + G] < aa (Hint: apply the Ruzsa tri- 


angle inequality to 2A, —A, and G.) Conclude that if m : Z > Z’ is 


a group homomorphism then o[7(A)] < BAI. One cannot replace the 


= JA 
tripling constant pa with the doubling constant; see Exercise 2.2.10. See 
however Exercise 6.5.17. 
Let K be a large integer, and let A = B = {e;, ..., ex} be the standard 


basis of ZX. Show that if S is any subset of A + B obeying (2.12) then 


|A+ B+nS| = (H) 

nS| = Q, | ————— 
|A|"|B|" 

where we are using the Landau notation ((). This shows that 

Lemma 2.13 cannot be significantly improved (except possibly by 

improving the bound (2.14)). 

Let A, B be additive sets with common ambient group such that |A + 

B| < K|A|'/|B|'” forsome K > 1. Using Lemma 2.13 and many appli- 

cations of the Ruzsa triangle inequality, establish the estimate 


InıA — mA +n3B — n4B| = On,,m,n,n (K Orr [Al BI'/) 
for all integers n1, n2, n3, n4. In particular, establish the bounds 


d(njA — nA + n3B — n4B,n5A —noA+n7B —ngB) 


< On, ng(1 + d(A, B)) 


ees 


for all integers n,,...,ng. We shall improve this bound slightly in 
Corollary 2.23 and Corollary 2.24; see also Corollary 2.19 for the “tensor 
power trick” that can eliminate lower order terms such as the implicit 
constant preceding the K 2-234 factor. 
Let G and H be subgroups of Z. Show that 


IG\|!/|AY!/2 

IGNA|~ 

Conclude that d(G,H)=d(G,G+H)+d(G+H,H)=d(G,Gn 
H) + d(G N H, H). Also, if K is another subgroup of Z, prove the con- 
tractivity properties d(G + K, H + K) < d(G, H) and d(GAK, H N 
K) <d(G, H). Note that the Ruzsa distance, when restricted to sub- 
groups of Z, is indeed a genuine metric, thanks to Proposition 2.7. See 
also Exercises 2.4.7 and 2.4.8 below. 

Let A be an additive set. Show that 


d(G, H) = log 


o[AU(—A)] < 20 [A] + o [A]. 


68 


2.3.13 


2.3.14 


2.3.15 


2.3.16 


2.3.17 


2.3.18 


2 Sum set estimates 


Thus a set with small doubling can be embedded in a symmetric set (i.e. 
a set B such that —B = B) with small doubling which has at most twice 
the cardinality. 

[289] Let A be an additive set. Prove the inequalities |A — A| 
< |A + A|?? and |A + A| < |A — AJ’. (Hint: use (2.11), Corollary 
2.12 and (2.1).) 

[26] Let A be an additive set. Show that there exists an element x € A — A 
such that the set F := A N (x + A)has size |F| > |A|/o[A] and doubling 
constant o [F] < o [A]. Thus every additive set A of small doubling 
contains a large symmetric subset F of small doubling, though the set F 
may be symmetric around a non-zero origin x /2. 

Let A, B be additive sets with common ambient group Z. Show that 
S[A] < e? B) and o [A] < e% 4-8), Thus only sets with small doubling 
constant can be close to other sets in the Ruzsa metric. (The 6 can be 
lowered to a 4, see Exercise 6.5.15.) 

Let A, B be additive sets with common ambient group Z. Show that 
o[A UB] < e44-8) 4 2e4(4:5). Thus a pair of sets which are close in 
the Ruzsa metric can be embedded in a slightly larger set with small 
doubling. In the converse direction, establish the estimate 


|AUB| 1 |AU B] 
log 
|A| 2 |B| 





1 
d(A, B) < logo[AU B] + 5 log 


Let A, B be additive sets with common ambient group Z, such that 
o[A], o[B] < K for some K > 1, and such that A N B is non-empty. 
Show that 

go min(Al, IB) 


AUB] <2K 
ol ]}<2K + JAN B] 


Thus the union of sets with small doubling remains small doubling pro- 
vided that those two sets had substantial intersection. 

[40], [41] Let K > 1, and let A1, A2, A3 be additive sets with common 
ambient group Z, such that 


1 
|A;M A3| > gái and 14; + 4; < K|Aj| 


for all j = 1, 2, 3. Prove that |A; + A2| < K°|A3]. Hint: use the triangle 
inequality 
d(A;, —A2) < d(A1, —(A1 N A3)) + d(—(A1 N A3), A2 N A3) 
+d(A2 N A3, —A2) 


2.3.19 


2.3.20 


2.3.21 


2.3.22 


2.3.23 


2.3.24 


2.4 Covering lemmas 69 


Suppose that A and B are subgroups of Z, and let x = y = 0. Show that 
all the inequalities in (2.8) are in fact equalities. 
Let A, B, C be additive sets in an ambient group Z. Show that 


max(E(A, B), E(A,C))<E(A, BUC)!? < E(A, B)'? + E(A, C)!”. 


(Hint: use Lemma 2.9 and the triangle inequality for the /? norm.) 

Let A, B,C be additive sets in an ambient group Z with |A| = |B| = 
|C| = N. Give examples of such sets where F(A, B) and E(A, C) are 
comparable to N 2 and E(B, C) is comparable to N 3 or where E(A, B) 
and E(A,C) are comparable to N 3 and E(B,C) are comparable to 
N*. These examples show that there is no hope of any useful “triangle 
inequality” connecting F(A, B), E(B,C), and E(A, C). 

Suppose A, B are additive sets in an ambient group Z. Show that 
E(A, B) = |A[?|B| holds if and only if |A + B| = |B|. One can thus 
use Proposition 2.2 to determine when the upper bound in (2.7) is 
obtained. Conclude in particular that E(A, B) = |A|?/*|B|*/? if and only 
if d(A, B) = 0, which in turn occurs if and only if A and B are cosets of 
the same finite group G. 

Give an example of an additive set A C Z of cardinality |A| = N such 
that E(A, A) > WN? but d(A, A) > in log N. Compare this with (2.8) 
(and with Corollary 2.31 below). 

Let A be an additive set. Show that there exists a subset A’ of A of 
cardinality |A’| > Xt Al |A| and an element ao € A’ such that |(a + A) NA 
(ao + A)| = Tat 7 |A| for alla € A’. (Hint: first obtain a lower bound for 
E(A, A).) 








2.4 Covering lemmas 


We now describe some covering lemmas, which roughly speaking have the follow- 
ing flavor: if A and B have similar additive structure (for instance, if their Ruzsa 
distance is small) then one can cover A by a small number translates of B (or some 
modification of B). 


Lemma 2.14 (Ruzsa’s covering lemma) [300] For any additive sets A, B with 
common ambient group Z, there exists an additive set X, C B with 


BOA-A+X,; RNS 


|A+ B| 
ap AE Xa = AX 





70 2 Sum set estimates 


and similarly there exists an additive set X_ C B with 
|A — BI. 
[A] * 


|A+B| |A-B| 
lA]? JAI 


BCA-A+X_; X-I < |A — X_| = |A||X_]. 








In particular, B can be covered by min( ) translates of A — A. 


Remark 2.15 One useful side benefit of this covering lemma is that there exist at 
least + nT disjoint translates A + b of A with b € B, as can be seen by restricting 


b to X4. 





Proof It suffices to prove the claim concerning A + B, since the claim concerning 
A — B follows by replacing B with —B and X, with —X_ (note that A — A is 
symmetric around the origin). Consider the family {A + b : b € B} of translates 
of A by elements of B. All of these translates have volume |A| and are contained 
inside A + B. Thus if we take a maximal disjoint sub-family of these translates, i.e. 
{A +x :x € X} for some X, C B, then X, can have cardinality at most Te 
Also we have |A + X+| = |A||X+| by construction. Now for any element b € B, 
we see that A + b cannot be disjoint from every member of {A +x :x € X} as 
this would contradict the maximality of X+. Thus A + b must intersect A + X+, 
which implies that b is in A — A + X+. Since b € B was arbitrary, we thus have 
B C A — A + X and the claim follows. 

















Covering lemmas such as the one above are convenient for a number of reasons. 
Firstly, they allow for easy computation of iterated sum sets. For instance, if one 
knows that 


A+BCA+X 
then one can immediately deduce that 
A+nB CA+nxX foralln > 0. 


This is advantageous if X is substantially smaller than B. Also, a covering property 
such as A+ B C A+ X is preserved under Freiman homomorphisms, whereas 
bounds such as |A + A| < K|A| are only preserved by Freiman isomorphisms 
(see Chapter 5, in particular Exercise 5.3.13). 


Remark 2.16 Observe that we are covering B by A — A rather than by A. This 
reflects the fact that A — A is a “smoother” set than A, and tends to contain fewer 
“holes” that would render it unsuitable for covering other sets. Later on we shall 
see that higher-order sum-difference sets such as 2A — 2A are even smoother, in 
that they tend to contain very large arithmetic progressions; see Section 4.7 and 
Chapter 12 for further discussion. 


2.4 Covering lemmas 71 


One can modify Ruzsa’s covering lemma in a number of ways. For instance, 
one can ensure the covering of B by translates of A — A has very high multiplicity 
(at the cost of increasing the number of covers by a factor of 2). 


Lemma 2.17 (Green—Ruzsa covering lemma) [1/54] Let A and B be additive 
sets with common ambient group. Then there exists an additive set X C B with 
|X| < 2a" — 1 such that for every y € B there are at least |A\/2 triplets 
(x,a,a')€ X x Ax Awithx +a —a' = y.More informally, A — A + X covers 


B with multiplicity at least |A|/2. Furthermore, we have 





B-BCA-A+X-X. 


[A+B] 
lAl 


|A-B| 


Similar claims hold if al 








is replaced by 
Proof Again it suffices to prove the claim for ae | We perform the following 
algorithm. Initialize X to be the empty set, so that X + A — A is also the empty 
set. We now run the following loop. If we cannot find any element y in B which is 
“sufficiently disjoint from X + A — A” in the sense that |(y + A) N(X + A)| < 
|A|/2, we terminate the algorithm. Otherwise, if there is such an element y, we 
add it to X, and then repeat the algorithm. 

Every time we add an element to X, the size of |X + A| increases by at least 
|A|/2, by construction, and at the first stage it increases by |A|. However, X + A 
must always lie within the set B + A. Thus this algorithm terminates after at most 
el — 1 steps. 

Now let y be any element of B. By construction, we have |y + A)M 
(X + A)| > |A|/2, and hence y has at least |A|/2 representations of the form 
x +a—a’ for some (x, a, a’) € X x A x A’, as desired. 

Finally, if y and y’ are two elements of B, then we have 





lac A:ytaeX+A}=|(y + AN(X +A) > IAl/2 


and similarly we have ļ|{a € A:y’+aeX+A}|>|A|/2. Thus by the 
pigeonhole principle there exists a € A such that y+aeX+A and y’+ 
a €X +A, thus y- y'= (y +a)— (y'+a)EX+A-—(X+A)=A-—-A+ 
X — X. Since y, y’ € B is arbitrary, we have B—BCA-—A+X-—X as 
claimed. 














In Section 5.4 we develop yet another covering lemma (Lemma 5.31), in which 
the covering set X is not arbitrary, but is in fact a cube. 

We now give an application of the Green—Ruzsa covering lemma, namely a 
variant of (2.6) which controls quadruple sums rather than double sums. 


72 2 Sum set estimates 


Proposition 2.18 Let A, B be additive sets in an ambient group Z. Then 


|A+ Bi*|A — A] 
|2B — 2B| < 16——___—_—_ 
|A|* 
Proof Applying the Green—Ruzsa covering lemma, we may find a set X of car- 
dinality |X| < 2 such that A — A+ X covers B with multiplicity at least 


|Al/2. 

Now let z be any element of B — B. By definition, we have z = bı — bz for some 
bı, b2 € B. By construction of X, we can find at least |A|/2 triplets (x, a1, a2) € 
X x A x A such that b2 = x + a; — a, and thus 


{@,a1,a2)€ X x AX Alz=b) — a, +a — x}| = |Al/2. 
Making the change of variables c := bı + a2 € A + B, we conclude that 
Hæ, c, a) € X xX (A+ B)x A:z=c-—a,—x}| > |Al/2. 
Similarly, if z’ is another element of B — B, we have 
Ha, c, ai) EX x (A+ B)x A:z = —a — x} > |A]/2, 
and hence 
{@, x, c, caa) EXXxXXx(A+B)x(A+B)xAxA: 
z=c-a,—x, Z = —a—x'}| > |Al?/4. 


Now write d := a, — a, € A — A, and observe that if z = c — a, — x and z’ = 
c’ — a, — x’ then 


z—-z=c—c—d—-x4nx'. 





Also, if one fixes z, z’, c, c’, d, x, x’, thena; anda} are determined by the equations 
a) =c — x —z,a| = c —x' — 2’. Thus we have 
I(x, x/,c, c,d) €X x X x (A+B) x (A+B)x(A—A): 
z—2 =c—c —d—x+x'}| > |Al/4. 





Note that z — z’ is an arbitrary element of (B — B) — (B — B) = 2B — 2B. Thus 
we have shown that an arbitrary element of 2B — 2B has at least |A|?/4 rep- 
resentations of the form c —c’ — d —x + x' where (x, x’,c,c’,d) EX x X x 


(A + B) x (A+ B) x (A — A). The claim then follows since |X| < 2A, 














We can eliminate the factor of 16 by the following elegant “tensor power trick” 
of Ruzsa [297]: 


Corollary 2.19 Let A, B be additive sets in an ambient group Z. Then 


|A + B|f]A — A| 


2B — 2B| < 
i [A]? 





2.4 Covering lemmas 73 


Proof Fix A, B, and let M be a large integer parameter. We consider the M -fold 
Cartesian product A®™ := A x --- x A, which is a subset of the additive group 
Z®M := Z ®--- ® Z; similarly consider B®”. Then one easily verifies 

2B" _ 2B = (2B — 2B); 
AM + BOM — (A + BOM, 
A&M — AM — (A — A)”, 

















Thus by applying Lemma 2.18 with A, B replaced by A®”, B®™ we obtain 
|A + BI |A— A|” 

|A pH 
Taking Mth roots of both sides and letting M —> oo, we obtain the result. 





[2B —2B|” < 16 














Specializing Corollary 2.19 to the case B := — A, we obtain 
Corollary 2.20 Let A be an additive set. Then 


IA - AP? 
|Al* 





A= 2A) < 


or, in other words, 
d(A — A, A — A) < 4d(A, A). 


Remark 2.21 One can improve these estimates slightly by using the machinery 
of Pliinnecke inequalities; see Corollary 6.28. 


Combining Corollary 2.20 with the Ruzsa covering lemma (Lemma 2.14 with 
B = 2A — A) we obtain 


Corollary 2.22 For any additive set A,2A — A can be covered by 5{ A} translates 
of A — A. 


This then shows that 3A — A is covered by 6 [AP translates of 2A — A, and 
hence by 5[A]!° translates of A — A. Continuing in this fashion, an easy induction 
then shows 


mA — nA can be covered by 5[A}"*"~ translates of A — A (2.16) 
for all m, n > 1. In particular we have 
lmA —nAl| < 5[APe"t”—P | A| for all m,n > 1. (2.17) 
From this (and the trivial estimates |k A| > |A| for any k > 1) we obtain 


Corollary 2.23 (Symmetric sum set estimates, preliminary version) Let A be 
an additive set. Then we have the estimates 


d(n,A = nA, n3A = n4A) < S5(nı +n + n3 + nad (A, A) 


74 2 Sum set estimates 


for any non-negative integers n1, n2, n3, n4. (The constant 5 is not best possible; 
we will improve it later.) 


Thus if A has small difference constant, then in fact all iterated sum sets of A 
are close to each other in the Ruzsa metric. Another consequence of the corollary 
is that 


a[n A — mA] < o [A] Ot”) 


for all non-negative integers n1, n2. The factor of 10 is not best possible; we shall 
obtain improvements to this constant later when we develop the machinery of 
Pliinnecke inequalities in Section 6.5. However, the linear growth in nı and n is 
necessary; see Exercise 2.4.9. 

By combining the above corollary with the Ruzsa triangle inequality one can 
obtain similar estimates for pairs of sets: 


Corollary 2.24 (Asymmetric sum set estimates, preliminary version) Let A, 
B be additive sets with common ambient group Z.Then we have the estimates 


d(njA — n A + n3B — n4B,n5A — ngA + nB — ngB) 
= O((nı +--+ +ng)d(A, B)) 


for any ny,...,ng EN. 


The proof is left as an exercise. 
We can use the above machinery to place additive sets with small difference or 
doubling constant inside a more structured set, namely an “approximate group”. 


Definition 2.25 (Approximate groups) Let K > 1. An additive set H is said to 
be a K-approximate group if it is symmetric (so H = —H), contains the origin, 
and H + H can be covered by at most K translates of H. 


Observe that a 1-approximate group is necessarily a finite group, and conversely 
every finite group is a 1-approximate group. 

We can summarize many of the preceding results by giving the following partial 
generalization of Proposition 2.7. 


Proposition 2.26 Let A be an additive set and let K > 1. Then the following 
Statements are equivalent up to constants, in the sense that if the jth property 
holds for some absolute constant C ;, then the kth property will also hold for some 
absolute constant C; depending on Cj: 


(i) o[A] < K® (i.e. |A + A| < KA); 

(ii) 5[A] < K® (equivalently, d(A, A) < C2 log K or |A — A| < K@|A|); 
(iii) d(A, B) < C3 log K for at least one additive set B; 
(iv) nA — mA| < K+] A] for all non-negative integers n, m; 


2.4 Covering lemmas 75 


(v) there exists a K®5 -approximate group H such that AC x + H for all 
x € A, and furthermore |A| > K~©|H|. 


Proof The equivalence of the first three properties follows from the Ruzsa triangle 
inequality and (2.11). The equivalence of the fourth property with (say) the second 
follows from Corollary 2.24. To see that the fifth property implies (say) the first, 
observe that if the former holds, then 


|A +A] <|H+H|< K°|H| < K*S JA]. 


To deduce the fifth from the fourth, take H = A — A and apply the Ruzsa covering 
lemma. 














Thus, in a qualitative sense, we have reduced the study of additive sets with 
small difference or doubling constant to the study of approximate groups, or pre- 
cisely to the study of dense subsets of translates of approximate groups. This is a 
fairly satisfactory state of affairs, except for the fact that we do not have a good 
characterization of which sets are approximate groups. The well known structure 
theorem for finite groups (see Corollary 3.8 below) asserts that every finite group is 
the product of finite cyclic groups; we shall eventually be able to obtain a somewhat 
similar characterization of approximate groups, showing that they are efficiently 
contained in a generalized arithmetic progression. For some other properties of 
approximate groups, see the exercises below. 

There is an asymmetric counterpart to Proposition 2.26, whose proof we leave 
as an exercise. 


Proposition 2.27 Let A, B be additive sets in an ambient group Z,and let K > 1. 
Then the following statements are equivalent up to constants, in the sense that if 
the jth property holds for some absolute constant C j, then the kth property will 
also hold for some absolute constant C depending on Cj: 


(i) d(A, B) < Ci, logK; 

(ii) d(A, —B) < Clog K; 
(iii) |A + B| < K® min(|A], |B|); 

(iv) |A — B| < K“ min(|A], |B D; 

(v) |n; A — nA + n3B — n4B| < KO+n2+73+"4)| A] for all non-negative 
integers ni, N2, N3, N4; 
(vi) o[A], o[B] < K“, and there exists x € Z such that 
AN(B+x)| > K~|Al!?|BI'?; 

(vii) o[A], o[B] < K®, and E(A, B) > K~@|AP?|B/?/?; 
(viii) there exists a K°8-approximate group H such that A C H + a and 
B C H +b foralla € A,b € B, and furthermore |A|, |B| > K7®|H|. 





Observe that Exercise 2.3.7 is essentially the K = 1 case of this Proposition. 


76 


2 Sum set estimates 


Proposition 2.27 gives a satisfactory characterization of pairs of sets with small 
Ruzsa distance, in terms of approximate groups, provided that one is ready to lose 
some absolute constants in the exponents. Note however that it is restricted to 
treating those sets A, B which are comparable in magnitude up to powers of K 
(cf. Exercise 2.3.6). A partial analogue of this proposition exists in the case when 
A and B are very different in magnitude, but the theory here is not as satisfactory; 
see Section 2.6. 


Exercises 


2.4.1 


2.4.2 


2.4.3 


2.4.4 


2.4.5 


Let Z be a finite additive group, and let A be a random subset of Z such 
that the events a € A are independent with probability 3/4 for alla € Z. 
Show that with probability 1 — 0)7),..(1), |A| > |Z|/2 (so in particular 
A+A= A- A= Z, by Exercise 2.1.6), but that it is not possible to 
cover Z using fewer than 5 log |Z] translates of A. (Hint: if X is an 
additive set with |X| < 5 log |Z|, use Lemma 2.14 to find an additive set 
Y with |Y| = O©(|Z|/ log? |Z|) such that the translates y — X are disjoint 
for all y € Y. Compute the probability that A is disjoint from at least one 
of the sets y — X, and conclude an upper bound for the probability that 
A+ X = Z. Now take the union bound over all choices of X .) This shows 
that we cannot replace A — A by A in Lemma 2.14 without admitting 
some sort of logarithmic loss. 

Let A be an additive set in a group Z, and let 6: Z > Z’ be a group 
homomorphism. Establish the inequalities 


|A| < |A(A)| sup IAN @ '(x)| < |2Al. 


(Hint: use the Ruzsa covering lemma to cover A by translates of a subset 
of ¢~'(0).) In particular equality is attained in both inequalities when A 
is the coset of a group. 

Prove Corollary 2.24. What value of the implicit constant in the O() 
notation do you get? 

Let A be an additive set such that |2A — 2A| < 2|A|. Conclude that A — A 
is a group. (Hint: use Lemma 2.14.) From this and Corollary 2.19 we see 
that if |A — A| < 2!/>|A], then A — A is a group. The constant 2!/> can 
be improved to 3; see Exercise 2.6.5 below. 

Let G be a K-approximate group for some integer K > 1. Show that 
InG| < (SIG for all integers n > 1. Conclude in particular the 
bounds 


InG| < min(K", n*—!)|G| for all n > 1; 


2.4.6 


2.4.7 


2.4.8 


2.4 Covering lemmas 77 


thus the numbers |nG | grow exponentially inn forn < K but settle down 
to become polynomial growth forn > K. In fact for any additive set, |n A| 
is a polynomial in n for sufficiently large n; see [261] for a proof of this 
fact and some further discussion. 

Let A be an additive set with doubling constant o[A] = K for some 
K > 1. Show that 


In Al < min(KO", nX°—!)/A| 


for all n > 1 and some absolute constant C > 0. (Note that if K is very 
close to 1, then one can use Exercise 2.4.4 to obtain a much stronger 
bound.) 

Let G bea K -approximate group in an ambient group Z, and let H bea K’- 
approximate group in Z. Show that G + H isa K K'-approximate group. 
Show that 2G N 2H is a (K K’) -approximate group. (Hint: first show 
that (2G N2H)— (2G N2H) cC (G+X)N(H +Y) for some X,Y of 
cardinality at most K? and (K'Y respectively, and then show that each set 
of the form (G + x) AO (H + y) is contained in a translate of 2G N 2H.) 
Modify Exercise 2.2.9 to show that this type of statement fails quite badly 
if the set 2G N 2H is replaced by G N H. Also, establish the cardinality 
bounds 

IGI|H| 1 IGI|H]| 


——— < |2GN2H|< —— 
IG + H| (KK’3 |G+4H| 





(Hint: use (2.8) for the lower bound, and the Ruzsa triangle inequality for 
the upper bound.) Conclude the estimates 


d(G, H) <d(G,G+H)+d(G+H, H) < d(G, H) + log K K’ 
and 
d(G, H)<d(G,2G N2H)+d2G N 2H, H) < d(G, H)+ 3log KK’, 


and compare this with Exercise 2.3.11. 
For each j = 1, 2,3, let G; be a K ;-approximate group in an ambient 
group Z. Using the Ruzsa triangle inequality, show that 


|G, + G2||G2 + G3| 
|G2| l 





IG; + G2 + G3| < K2 


Conclude that 


d(Gı + G2, Gi + Gz + G3) < d(G2, G2 + G3) + log Kı Kp. 


78 


2.4.9 


2.4.10 


2.4.11 


2.4.12 


2.4.13 


2.4.14 


2 Sum set estimates 


Similarly for permutations. Conclude from this and the preceding exercise 
that 


d(G1, G2) < d(G; + G3, G2 + G3) + 2log K, K2K3 


and compare this with Exercise 2.3.11. (A corresponding statement exists 
for intersections but is somewhat tricky to establish.) 

For any integers K, n1, n2 > 1, give an example of an additive set A with 
o[A] = K and o[n;A — nA] = Qn, n, (Kt). 

Let A, B be additive sets in a common ambient group Z. Show that 
o[A+ B] < (o[A]o[B])© where C > 1 is an absolute constant. (Hint: 
use Proposition 2.26 to place A and B inside translates of approximate 
groups. To obtain lower bounds on |A + B|, use the inequality 


TEES 
~ (A= A) - B) 





from (2.8).) 

Prove Proposition 2.4.11. (Hint: to construct the approximate group H, 
one possible choice is H = A — A+ B — B.) 

Try to improve upon the constant 5 in (2.17), by using the Ruzsa triangle 
inequality instead of the Ruzsa covering lemma. This exercise demon- 
strates that the triangle inequality is slightly sharper than the covering 
lemma when one wants cardinality bounds, but the covering lemmas of 
course give much more information than just cardinality. 

[209] Let A, B be additive sets in an ambient group Z, and let G be the 
group generated by A. Show that there exists an additive set B’ € B such 
that B’ is contained in a coset of G, and such that |A + B’| < z1 |A +B]. 
Let A, B, A’, B’ be additive sets with common ambient group Z. Estab- 
lish the inequality d(A + A’, B + B’) = O(d(A, B) + d(A', B’)). (Hint: 
argue as in Exercise 2.4.10.) Conclude that if 6: Z —> Z’ is a group 
homomorphism, then d(¢(A), ¢(B)) = O(d(A, B)). Thus group homo- 
morphisms are “Lipschitz” with respect to the Ruzsa distance. 


2.5 The Balog-Szemerédi-Gowers theorem 


In the previous sections we have only considered complete sum sets A + B and 
complete difference sets A — B. In many applications one only controls a partial 
collection of sums and differences. Fortunately, there is a very useful tool, the 
Balog—Szemerédi-Gowers theorem, which allows one to pass from control of 
partial sum and difference sets to control of complete sum and difference sets 
(after refining the sets slightly). We begin with some notation. 


2.5 The Balog—Szemerédi-Gowers theorem 79 


Definition 2.28 (Partial sum sets) If A, B are additive sets with common ambi- 
ent group Z, and G is a subset of A x B, we define the partial sum set 


G 
A+B:={a+b: (a,b) €G} 


and the partial difference set 


AČB:=ļa—b: (a,b) €G}. 


One may like to think of G as a bipartite graph connecting A and B. Note 
that when G = A x B is complete, then the notion of partial sum set and partial 
difference set collapse to just the complete sum set and difference set. 

Partial sum sets and partial difference sets are not as nice to work with alge- 
braically as complete sum sets. In particular, the above machinery of sum set 
estimates do not directly yield any conclusion if one only assumes that the cardi- 


G 
nality |A + B| of a partial sum set is small. Note that even when G is very large, 


it is possible for |A $ B| to be small while |A + B| is large; see exercises. For- 
tunately, the Balog—Szemerédi—Gowers theorem, which we will present shortly, 
does allow us to conclude information on complete sum sets from information on 
partial sum sets, if we are willing to refine A and B by a small factor (i.e. replace 
A and B by subsets A’ and B’ which are only slightly smaller than A and B). 

The first result in this direction was by Balog and Szemerédi [16], using the 
regularity lemma. A different, more effective proof, was found by Gowers [137] 
(with a slight refinement by Bourgain [38]), in particular with dependence of 
constants that are only polynomial in nature. Here we present a modern formulation 
of the theorem, following [340]. 


Theorem 2.29 (Balog—Szemerédi-—Gowers theorem) Let A, B be additive sets 
in an ambient group Z, and let G C A x B be such that 


G 
|G| > |A||B|/K and |A + B| < K'IA|'? |B|"? 


for some K > 1 and K' > 0. Then there exists subsets A' C A, B' C B such that 





|A’| > lAl (2.18) 
T A/2K ` 
|B’| > Bi, (2.19) 
— AK i 
|A’ + B'| < 2° KKP IAB. (2.20) 


In particular we have 


d(A’, —B') < 5log K + 3log K’ + O(1). 


80 2 Sum set estimates 


The proof of this theorem is graph-theoretical. It is elementary, but a little 
lengthy and so we postpone it to Section 6.4. One can of course combine this 
theorem with Corollary 2.24 and Proposition 2.26 to gain more information on 
the iterated sum and difference sets of A” and B”. It is likely that the factor of 
2"? K*(K"P in (2.20) can be improved. However, the bounds (2.18), (2.19) cannot 
be significantly improved; see exercises. 

To apply the Balog—Szemerédi—Gowers theorem, it is convenient to introduce 
the following lemma connecting large additive energy to small partial sum sets or 
small partial difference sets. 


Lemma 2.30 Let A, B be additive sets in an ambient group Z, and let G be a 
non-empty subset of A x B.Then 


2 2 
E(A, B) > IG| IG| 





’ G $ 
|A+ B| |A- B| 


Conversely, if E(A, B) > |A|?/?|B)3/2/K for some K > 1, then there exists 
G C A x B such that 


IG] > |A||B|/2K; |A + B| < 2K |A|"? |B|"?. 
and similarly there exists H C A x B such that 
H| > |AIIB|/2K; |A = B| < 2K]A|"?]B|"?. 
Proof Observe that 
>» l(a, b) €G :a+b= x}| = |G| 


xeA4B 
and hence by Cauchy—Schwarz 
x — 1/2 2 
È -48g NG bd) EG :a +b =x}! > |G| 





G 
|A + B| 
But the left-hand side is equal to 


{(a,a',b,b.)Ee Ax AxXBxB:atb=a'4+B;(a,b), (a,b) € G} 


G 
which was less than £ (A, B). This proves that E (A, B) > |G|?/|A + B|; using the 
G 
symmetry E(A, B) = E(A, —B) we thus also obtain E(A, B) > IG/?/|A — B|. 
Now assume E(A, B) > |A|*/*|B|?/*/K. Then by Lemma 2.9 we have 
|A|?/?|B|3/2 


2 
>) IAN@— BP = —] 


xE€A+B 


2.5 The Balog—Szemerédi-Gowers theorem 81 


If we set S := {x € A + B : |AN (x — B)| > [A|!/7|B|!/2/2K }, we then have (by 
Lemma 2.9 again) 
7 JAP? (BP? — [AIBA 7|BI? [AP |B)? 

K 2K © 2K 





J IANG- B)? 


xeS 


Now observe from Lemma 2.9 again that 
ISI|A|"? B|"? 


aE < SIAN — B)| < AIIB] 


xeS 


and hence 


ISI < 2K|A|"?|B)'/?. 
Now let G := {(a,b) € A x B :a +b e S}, then clearly A F B C S and hence 
|A ap B| <2K\A|'/?|B\!”. 
Furthermore we have 


IGl= ola b)e AxB :a+b= x} 


xeS 
= TIAN -B) 
xeS 
> en JAN = B)? 
[Ap [x — Biv? 
x ials1B192/2« 
~ (Apap? 
= |A||B|/2K. 


This gives the desired set G. The construction of H follows by using the symmetry 
E(A, B) = E(A, —B). 














Combining this Lemma with the Balog—Szemerédi—Gowers theorem, we can 
obtain a characterization of pairs of sets with large additive energy. 


Theorem 2.31 (Balog—Szemerédi—Gowers theorem, alternative version) Let 
A, B be additive sets in an ambient group Z, and let K > 1. Then the following 
statements are equivalent up to constants, in the sense that if the jth property 
holds for some absolute constant C ;, then the kth property will also hold for some 
absolute constant C depending on Cj: 


(i) E(A, B) > KAP? |BP?,; 
(ii) there exists G C A x B such that |G| > K~°|A||B| and 
G 
|A + B| < K®|A|’?|Bi'?; 


82 


2 Sum set estimates 


(iii) there exists G C A x B such that |G| > K~@|A||B| and 


G 
|A— B| < K@|A|?|B)'?; 


(iv) there exists subsets A' C A, B' C B with |A'| > K~“4|A|, |B'| > K~|B, 
and d(A', B’) < C4log K; 
(v) there exists a K“-approximate group H and x,y € Z such that 


IAN (H + x)|, |B O (H + y)| > K~S|A| and |A|, |B| < K°|H|. 


We leave the proof of this theorem to the exercises. Theorem 2.31 should be 
compared with Exercise 2.3.22, which is the K = 1 case of this Theorem. As 
with Proposition 2.27, this Theorem is restricted to sets A, B which are close in 
cardinality (see exercises). We shall address the question of sets A, B of widely 
differing cardinalities in the next section. 


Exercises 


2.5.1 


Let A,B be additive sets with common ambient group Z such that 
E(A, B) > K7|A[P>?|B|?/*.. Show that K~?|A| < |B| < K?|Al, and 
show by means of an example that these bounds cannot be improved. 

Give an example of an additive set A C Z of cardinality N, and a set 


G C Ax A of cardinality N?/4, such that |A $ A| < N but |A+ A| > 
N*/8. (Hint: concatenate a Sidon set with an arithmetic progression.) 

Let N > K > 1 be large integers, with N a multiple of K. Give an 
example of sets A, B C Z of cardinality |A| = |B| = N anda subset G C 


A x B of cardinality |G| = |A||B|/K with the property that |A $ B| < 
2N, but such that |A” + B”| > N?/K? whenever A” C A and B” C B 
is such that |A”| > 2|A|/K. (Hint: take B to be a long progression, and 
take A to be a short progression concatenated with some generic integers.) 
This shows that the conditions (2.18), (2.19) in Theorem 2.29 cannot be 
significantly improved. 

Let A, B,C be additive sets in an ambient group Z, let 0 < € < 1/4, 
and let G C A x B, H C B x C be such that |G| > (1 — €)|A||B| and 
|H| > (1 — €)|B||C|. Show that there exists subsets A’ C A and C’ CC 
with |A’| > (1 — e!/?)|A| and |C’| > (1 — €!/?)|C| such that |A’ — C’| < 
JA g B\|B ú C|/( — 2e!/?)|B|. (Hint: show that at most £!/|B| ele- 
ments of B have a G-degree of less than (1 — ¢!/?)| A], and similarly at 
most ¢!/?|B| elements have a H -degree of less than (1 — ¢!/?)|C|.) This 
result is can be used as a substitute for the Balog—Szemerédi—Gowers 
theorem in the case when the graph G is extremely dense; it has the 
advantage that it does not require A, B, C to be comparable in size and 





2.6 Symmetry sets and imbalanced partial sum sets 83 


it does not lose any constants in the limit € — 0; indeed it collapses to 
Ruzsa’s triangle inequality in that limit. 

2.5.5 Prove Theorem 2.31. (Hint: for K large, e.g. K > 1.1, one can use the 
Balog—Szemerédi—Gowers theorem and Proposition 2.27. For K small, 
e.g.1 < K < 1.1,onecan use Exercise 2.5.4 as a substitute for the Balog— 
Szemerédi—Gowers theorem.) 

2.5.6 [80] Let A, B be additive sets with common ambient group such that 


|A| = |B| = N and |A + A| < KN. Suppose also that |A Ki B| < KN, 
where G C A x B is a bipartite graph such that every element of B is 
connected to at least KT!N elements of A. Show that |A + B| < KOON 
and |B + B| < K°N, (Hint: write the elements of A + B in the form 


G 
x — y +z where x € A + A, y € A+ A,andz € A +B.) 


2.5.7 [80] Let A be an additive set such that |A ki A| < K|A|, where G C 
A x A is such that every element of A is connected via G to at least 
K~—'|A| elements of A. Show that one can partition A into O(K 0) 
subsets A,,..., Am such that |A; + A;| = O(K°|A]) for each 1 < 
i < m. (Hint: use the Balog—Szemerédi—Gowers theorem and an iteration 
argument to obtain most of the subsets, and then Exercise 2.5.6 to deal 
with the remainder.) 


2.6 Symmetry sets and imbalanced partial sum sets 


The Balog—Szemerédi—Gowers theorem is a very powerful tool when studying 
two additive sets A, B with additive energy E(A, B) close to |A|?/*|B |?/*; however 
from (2.7) we see that this situation only occurs when |A| and |B | are comparable in 
size. This leaves open the question of what happens in the case | A| >> |B | (say) and 
E(A, B) is close to the upper bound of | A||B|* given by (2.7). A special sub-case 
of this (thanks to (2.8)) is the case when |A + B| or |A — B| is comparable to |A]. 
Note that Proposition 2.2 already gives an answer to this question in the extreme 
case when |A + B|=|A| or |A — B| = |A| (or equivalently if E(A, B) = 
|A||B|?; see Exercise 2.3.22). However, an example of Ruzsa [297] shows 
that things become bad when |A| and |B| are very widely separated; see the 
exercises. 

Ifhowever we are prepared to endure logarithmic-type losses in the ratio | A|/|B| 
(or more precisely losses of the form (|A|/|B|)° where £ can be chosen to be 
small), then one can recover a reasonable theory. In analogy with Proposition 2.2, 
one expects that if |A + B| is comparable to |A|, or if E(A, B) is close to |A||B|?, 
then there should be an approximate group H such that A is approximately the 


84 2 Sum set estimates 


union of translates of H , and B is approximately contained in a single translate of 
H. To achieve this will be the main objective of this section. 

In the extreme case when |A + B| = |A| or E(A, B) = |A||B|’, the approxi- 
mate group H was in fact an exact group and in the proof of Proposition 2.2 it was 
constructed as the symmetry group Sym, (A) of the larger additive set A. In the 
general case this symmetry group is likely to be trivial. However, a more general 
notion is still useful. 


Definition 2.32 (Symmetry sets) Let (A, Z) be an additive set. For any non- 
negative real number «œ > 0, define the symmetry set Sym,(A) © Z at threshold 
a to be the set 


Sym,(A) := {h € Z : |A N (A + h)| > aA}. 


Note that Sym; (A) = {h € Z : A+ h = A} is the same symmetry group 
applied in the proof of Proposition 2.2. The other symmetry sets are not groups 
in general, but nevertheless they are still symmetric (so —Sym,(A) = Sym,(A)) 
and contain the origin, and they obey the nesting property Sym,(A) GC Sym,(A) 
for a > $. It is also clear that Sym,(A) C A — A for all 0 < a < 1. Note that 
as Sym, (A) is empty for œ > 1 and equal to all of Z for a < 0, we shall mostly 
restrict ourselves to the non-trivial region where 0 < a < 1. 

We now relate the size of these symmetry sets to the additive energy. From 
Lemma 2.9 we have 


E(A,A)= > |AN(A+h)? 
heA-A 


and hence for any 0 < œ < 1 and the crude bounds |A N (A + A)| < |A| when 
h € Sym, (A) and |A N (A + A)| < a|A| when h ¢ Sym, (A), we have 


a’ A} |Sym, (A)| < E(A, A) < a?|A|?]A — A| + [A/*|Sym,(A)], 


which indicates that Sym, (A) should be large whenever the energy is large. In 
particular, from (2.7) we have 


|Sym,(A)| < |A|/a?. (2.21) 


Now let A, B be additive sets in an additive group Z. From Lemma 2.9 again, we 
have 


E(A, B) = 5 IJAN (A +b- b") 
b,b'eB 


and hence for any 0 < œ < 1 we have 


E(A, B) < |B[’a]A| + AIH, b’) € B : b — b' € Sym,(A)}]. 


2.6 Symmetry sets and imbalanced partial sum sets 85 


In particular, if E(A, B) > 2a|A||B|?, then we conclude that there is a set G C 
B x B of cardinality |G| > a|B|? such that 
G 
B — B C Sym,(A). (2.22) 


At first glance it seems that one may now be able to apply the symmetric Balog— 
Szemerédi—Gowers theorem. However, the fact that A is much larger than B means 


that B 2 B may be much larger than B (compare (2.22) to (2.21)). To get around 
this difficulty we need to iterate this construction, and exploit the fact that Sym, (A) 
behaves like a group. This is already clear when œ = 1, when Sym, (A) is indeed 
a genuine group; the following lemma shows that this behavior persists in an 
approximate sense for œ less than 1. 


Lemma 2.33 Let A be an additive set. Then we have 
Sym,_,(A) + Sym,_,(A) C SYM- (A) (2.23) 


whenever £, s' > 0. Furthermore, if0 < œ < 1 and S C Sym,(A) is a non-empty 
set, then there exists a set G C |S|? with 


IG] = a?|S|?/2 (2.24) 
such that 
G 
S — S  Symyz2/9(A). (2.25) 


Proof To verify the first claim, observe that if x € Sym,_,(A) and y€ 
Sym; (A) then 


(A + x)\A] = |A| = [A N (A + x)| < eļA| 
and 
(A+x)\(A+x+ yl =|A] -|AN (A + y)| < e'lAl, 
and hence 
JAN(A+x+y2A+xX)NANA+x+y) > (A — e- ^A] 


which proves (2.23). 

Now we prove the second claim. By definition of S, we see that for each x € $ 
there exist at least œ| A| elements a € A such that a + x € A. Summing this over 
all x we see that 


Sole eS :atx € A} > aAllS]. 


acA 


86 2 Sum set estimates 


Applying Cauchy—Schwarz we conclude that 
{ae A:atx,at+yeA}| =) ix ES:a+x €A}? > a’ [Alls?. 
x,yeSxS acA 


If we set G C S x S to be all the pairs (x, y) such that 
{ae A:atx,aty€A}|>a’|Al/2 


then we have 


a? |A 
IAIIGI> So lfaeA:a+x,a+y €A} > a AlsP — aA 


(,y)EG 


ISI? 


which gives (2.24). Also, if (x, y) € G then |A N (A + x — y)| > œ?|A|/2 by def- 
inition of G, which gives (2.25). 














Before we proceed with the main theorem, we need a technical lemma that 


G 
uniformizes the size of the fibers {(a,a') € G :a — a' = x} of A — A. 


Lemma 2.34 (Dyadic pigeonhole principle) Let A be an additive set, and let 


G 
G C A x A be such that |G| > «|A|? and |A — A| < L|A| for some 0 < œ < 1 
and L > 1. Then there exists a subset G' of G with 


|G’| = @ ( ——*—_aP 
1+ log; +logL 


IG] 
G' 
2|A — Al 


and 


I((a,a)eG':a—d' = x}| > 


G' 
forallx € A — A. 


It is important to note that the dependence on L only enters in a logarithmic 
manner. 


Proof Let D be the set of all x such that 


a|A|? a 
=~ Al 
2L|A| 2L 





H(a,a)eG:a—a' =x} > 


(thus D is the set of “popular differences”) and set G to be the pairs (a, a’) in 
z G 

G such that a — a’ € D. Then we have |G\G| < 5-|A||A — A| < a|A|*/2, and 

hence |G| > @|A|*/2. On the other hand, we have the crude upper bound 


Ha,a') € Č :a—a' = x}| < Ñ` {a € A : a = x +a'}] < JAI. 


a'EA 


2.6 Symmetry sets and imbalanced partial sum sets 87 


Thus if we let M be the least integer such that 27™ < zp» We can partition G= 


G,U---UGy where Gm := {(a,a’) € G : a — a' € Dm} and 
G e 
Dm := {x € A — A : 2™]|A] < (Ga) € Č :a — a' = x}| < 2™+A]}. 
By the pigeonhole principle, there exists 1 < m < M such that 


a 


2 
(eis loan). | 








1 
IGu| = —IG| > 
M C 


By the definition of Dm, we have 


[Gnl IG m| 
ial £ |Pal S 
2=m+1] A| 2-™| A| 





Gm 
since D,, = A — A, we thus see that 
; ; , = IG’ 
[{(a,a)EG :a-a=x}|>2 "|A| = = 
2|A — A| 














Gm 
for allx € A — A. The claim then follows by setting G’ := G m. 
Now we give the main theorem of this section. 


Theorem 2.35 (Asymmetric Balog-Szemerédi-Gowers theorem) Let A, B 
be additive sets in an additive group Z such that E(A, B) > 2æļ|A||B|? and 
|A| < L|B| for some L > 1 and O < «æ < 1. Let £ > 0. Then there exists a 
O.(a~° L*)-approximate group H in Z, an additive set X in Z of cardinality 
[X| = O,(a~- OLE A|/|H |) such that |A N (X + H)| = Q.(a%OL-*|Al), and 
an x € Z such that |B A (x + H)| = Q.(a?*L*|B)). 


Observe in the converse direction that if the conclusions of this theorem are 
true, then E(A, B) = Q.(a?*YL~?| Al |B?) (Exercise 2.6.3 at the end of this 
section). Thus this theorem is sharp up to polynomial losses in œ and L°, where € 
can be made arbitrary small; the example in Exercise 2.6.1 can be adapted to show 
that this loss is necessary (Exercise 2.6.2). 


Proof A direct application of Theorem 2.31 will lose far too many powers of L. 
The trick is to embed B in a long increasing sequence of sets Bo, B1, B2, ..., with 
each B; being (roughly speaking) a partial difference set of the previous one, and 
use the pigeonhole principle to show that at some stage the ratio |Bj+,|/|B,| is 
bounded by a small power of L. One can then apply Theorem 2.31 with acceptable 
losses and conclude the theorem. (This method of proof is inspired by a similar 
argument in [40].) 


88 2 Sum set estimates 


We turn to the details. It will be convenient to use a variant of the Landau O() 
and Q() notation which can absorb factors of œ and log L (which we think of as 
being relatively close to 1). If X, Y are non-negative quantities and j is a parameter, 
let us say that X = O,(Y) orY = Q(X) if one has an estimate of the form 


X < Ca CY log L 


for some C(j) > 0 depending only on j. 

Let J = J(e) > 1 be a large integer to be chosen later. Let 1 > aj >--- > 
a 741 > 0 be the sequence defined recursively by ay := & and aj4) := a?/2 for 
alll < j < J.From induction we see that a; = &2;(1).We claim that we can find a 
sequence Bo, B1, ..., Bz, Bj+1 of additive sets in Z with the following properties. 


e Bo = B, and forall 1 < j < J + 1 we have 





B,C Sym, (A). (2.26) 
e Forall0 < j < J + 1, we have 
a; °L|B| > |Bj| = 2;(1B)). (2.27) 
e Forall0 < j < J, there exists G; C B; x B; such that 
IG j| = &;(B;® (2.28) 
and 
Gj 
B j+ = Bj — Bj. (2.29) 
Furthermore, for all x € Bj, we have 
1 . on _ 6 |B; | 
H{(,b)<€G;:b-b=x}| =, ‘ (2.30) 
|B jl 


We construct the B; as follows. We set Bo := B. From (2.22) followed by 
G 
Lemma 2.34 we can construct Go C Bo x Bo and By := Bo a Bo obeying (2.26), 


G 
(2.28), (2.29), (2.30). Since each element in Bo = Bo can be represented as a 
difference of a pair in G in at most |Bo| ways, we have 


Go S 
|Bi| = |Bo — Bol = |Gol/IBol = &2;(B)), 


which is the lower bound in (2.27); the upper bound follows from (2.26) and (2.21). 

Next, suppose inductively that B; € Sym,, (A) has already been chosen for 
some 1 < j < J. Applying Lemma 2.33 (with S := B ;) followed by Lemma 2.34, 
and using the cardinality bounds already obtained in (2.27) and the construction 


G; 
oF = a?/2 of the œ;, we can thus find G; C B; x B; and Bj, := Bj = Bj 


2.6 Symmetry sets and imbalanced partial sum sets 89 


obeying (2.26), (2.28), (2.29), (2.30). This closes the induction and so we can 
construct the B; for all 0 < j < J + 1, and similarly obtain the G; for all 1 < 
jad. 

Now for the crucial step (which explains why we iterated the above procedure 
so many times). From (2.27) and the pigeonhole principle, there exists 1 < j < J 
such that 


[Bjui] = 0, (LO0/ |B, I); 


the point is that we have managed to replace L by the substantially smaller quantity 
LOC/), If we now apply (2.29), (2.28), and Theorem 2.31, we can thus find a 
O,(L°“/)))-approximate group H of cardinality 


IH| = 07(E By) (2.31) 
and an x jEzZ such that 
IB; A (H + x| = (L7 |B;l) (2.32) 


for some absolute constant Co. It remains to relate H to B and to A. We begin with 
B. From (2.32) and (2.30) (with j replaced by j — 1) we have 


HÈ, b’) € Gj- :b—b' € Bj A (H +x} = & (L7 Bjal), 
so in particular 
I{(b, b’) € Bj- x Bj: b€ H +x; +53) = Š (L77 Bjal). 
Thus by the pigeonhole principle, there exists a b’ such that 
Hb € Bj :b—b' €H +x; +b} = & (L B;l). 
Thus if we set x ;_; := x; +b’ then we have 
|Bj-1 N (H + xj-1)| = y (L7 B-11). (2.33) 


We now repeat this argument with j replaced by j — 1 and (2.32) replaced by 
(2.33). Iterating this at most J times, we eventually locate an x = x9 € Z such 
that 


[BN (H +x)| = y (L7 |B), 


which gives the desired control on B if J is sufficiently large depending on e€. 
It remains to control A. From (2.32), (2.31) and (2.26) we have 


ty € H +x; : y € Sym, (A)| = Q,(L-°°/|H)) 
and thus by definition of Sym,, (A) and a; 


Ha, y) € Ax (H +x;):a +y € A} = ĝ (L790 HI IA]). 


90 2 Sum set estimates 


We rewrite this as 
YANG +x) = & (LHA). 
xExj+A 
We can therefore find a subset Xo of x; + A with 
[Xo] = Š; (L790 |Al) (2.34) 
such that 
|AN (H + x)| = 2,(L~°"/)|A]) for all x € Xo. 


Now we use an argument similar to that used to prove Ruzsa’s covering lemma 
(Lemma 2.14). Let X be a subset of Xo such that the sets {H + x : x € X} are all 
disjoint, and which is maximal with respect to set inclusion. Then we have 


[AN (H + X)| = IAN + x)| = Q,(L-O' IX). (2.35) 
xeX 


On the other hand, if y € Xo, then by maximality of X there exists x € X such 
that x + H intersects y + H. In other words, Xo is covered by X + H — H, and 
hence (since H is a O(L°//?)-approximate group) 


IXol < |X||H — H| = O(|X|L°""|H)). (2.36) 


Combining (2.34), (2.35), (2.36) we see that X obeys all the desired properties, if 
J is chosen sufficiently small depending on e. 














The above theorem can also be put in a form resembling Theorem 2.29: 


Corollary 2.36 Let A, B be additive sets with common ambient group such that 
E(A, B) > 2a|A||B|? and |A| < L|B| for some L > landO <a < 1.Lete > 0. 
Then there exists subsets A' C A and B’ C B such that 


|A’] = Qe (aL Al) 
|B" = Q (a9 L~|B)) 
|A’ + nB! — mB'| = O,(0- OL)" |A| 
for all integers n,m > 0. 


Proof Apply Theorem 2.35 and set A’:= AN(X +H) and B’:=BN 
(x + H). 














Because of (2.8), the above results give some partial results concerning the 
situation when |A + B| < K|A| and |A| is much larger than |B |, but these results 
will be rather weak. We will give a better result concerning this problem in 
Section 6.5, once we develop the Pliinnecke inequalities. 


2.6 Symmetry sets and imbalanced partial sum sets 91 


Exercises 


2.6.1 


2.6.2 


2.6.3 


2.6.4 


2.6.5 


2.6.6 


2.6.7 


[297] Let n be a large integer, and let Z := Z?”. Let A be the additive set 
A i= (001, X25 -+s X2n) € ZP” xy bo + Xan = 501, + Xan Z O} 


and let B := {e),..., €2n}. Show that |B| = 2n, that |A| = (27/4)"t°™, 
that |A + B| = O (|A|), but that |A — B| > n|A|. (You may find Stirling’s 
formula (1.52) to be useful.) 

Modify Exercise 2.6.1 to show that one cannot take e =0 in 
Theorem 2.35. 

Let A, B be additive sets and lete > 0,0 < œ < 1,and L > 1 be such that 
the conclusions of Theorem 2.35 are satisfied. Conclude that E (A, B) = 
Qe (DL -0®]A]|B]?). 

Let A be an additive set. By modifying the proof of Lemma 2.13, establish 
the inequality 


ô[A]”+! 
|A — A +nSym,(A)| < “—=— 1A] 
a 


for all integers n > 0 and all 0 <a@ < 1. 

[220] Let A be an additive set such that A — A is not a group. Show 
that there exists h € A — A such that 1 < |A A (A + h)| < |A|/2. (Hint: 
argue by contradiction, and analyze Sym, (A) for some « slightly greater 
than 1/2.) Conclude in particular that if |A — A| < 3/Al, then A — A is 
a group. Note that the example A = {0, 1} C Z shows that the constant 3 
cannot be improved; one can also make this example larger, for instance 
by taking the Cartesian product of {0, 1} with a finite group. For a more 
refined estimate on A — A, see Theorem 5.5 and Corollary 5.6. 

Let A,B be additive sets with common ambient group such that 
|A + B| < K|A| and |A| < L|B| for some K, L > 1. Let £ > 0. Show 
that there exists a O,(K %L*)-approximate group H such that B 
is contained in a translate of H, and that A is contained in at most 
O.(K?OL*|A|/|H|) translates of H; compare this with Proposi- 
tion 2.2. (Hint: Apply Theorem 2.35 and the Ruzsa covering lemma 
(Lemma 2.14).) 

Let A be an additive set, and let B be a subset of A such that |B| > 
(1 — €)|A| for some 0 < £ < 1. Prove that 


Syma ja) (B) E Sym, (A) © Sym_2¢)/(1-2)(B) 


for every a € R. 


92 2 Sum set estimates 


2.6.8 Let A be an additive set. Refine (2.21) slightly to 


[AQA] — D) 
a 


|Sym,(A)| < 1+ for alla > 0. 


2.6.9 [350] Let A, B be additive sets in Z, such that B consists entirely of 
positive numbers. Show that there exists b € B such that 


|A|— 1 Al 
|B} 2° 





|AN(A+b)| < 


(Hint: use Exercise 2.6.8, and exploit the fact that only half of the elements 
of Sym, (A)\{0} are positive.) 

2.6.10 [44] Let A be an additive set such that |A + A| < K|A| for some K > 1. 
Let G be the group generated by Sym 2 (A). Show that there exists a 
coset x + G of G such that |A N (x + G)| > |A|/3. (Hint: suppose for 
contradiction that |A N (x + G)| < |A|/3 for all x. Use the greedy algo- 
rithm to partition A = A’ U A” where |A|/3 < |A’|, |A”| < 2|A|/3 and 
such that A’ — A” is disjoint from G (and thus disjoint from Sym 2 (A)). 
Use this to obtain an upper bound on F(A’, A”) and use (2.8) to obtain a 
contradiction.) 


2.7 Non-commutative analogues 


Many of the above arguments carry over to the non-commutative setting, though 
one of course now needs to take care with the ordering of multiplication. We sketch 
some of the main points here and leave the details as exercises. For further details 
see [362]. 


Definition 2.37 A multiplicative group is any group G (not necessarily abelian) 
with group operation -, with inversion operation x +> x~!, and identity element 
1. An multiplicative set is a pair (A, G), where G is a multiplicative group, and A 
is a finite non-empty subset of G. We often abbreviate a multiplicative set (A, G) 
simply as A, and refer to G as the ambient group. 


If A and B are multiplicative sets with common ambient group G, we define 
their product set 


A-B := f{ab:a € A,b € B} 
and the inverse set 


AT! := fa! :a € A}. 


2.7 Non-commutative analogues 93 


We also define right translates A -x and left translates x - A for x € G in the 
usual manner. Note that x-A#A-x and A -B Æ B -A in general, although 
we do have |A| = |x - A| = |A - x| = |A7!]. We also define iterated product sets 
A" :=A.-...-A for n> 1, with the conventions that A® := {1} and A™” := 
(A”)7! = (AT) ”. 

We remark that A - B and B - A may have widely different cardinalities; for 
instance if H is a finite subgroup of G and x is an element of G that does not lie in the 
normalizer N (H) := {x € G :xH = Hx} of H, then H - (x - H)and (x - H). H 
can have very different cardinalities. However, we still have the analogue of (2.1): 





max(|A|,|B|) < |A- B|, |B ; A| < JAIIBI; 


see exercises. 
We define the (left-invariant) Ruzsa distance d(A, B) between two multiplica- 
tive sets: 


IA - B—'| 


d(A, B) = log [APZB 


This distance still obeys the Ruzsa triangle inequality, mainly thanks to the iden- 
tity (ab7!)(bc7!) = ac™!. It is left-invariant in each variable, thus d(x - A, B) = 
d(A,x-B)=d(A, B), and is jointly right-invariant, d(A - x, B - x) = d (A, B), 
but is not separately right-invariant in each variable. Also it is not reflection invari- 
ant; the metric d*(A, B) := d(A7!, B7!) is the right-invariant Ruzsa distance, 
which we will not use here. 

Define a multiplicative K -approximate group to be any multiplicative set H 
which is symmetric (so H = H™!), contains the identity, and is such that there 
exists a set X of cardinality |X| < K such that we have the inclusions 


H-HCX-HCH.-X-X; H-HCH-XCX.-X-H. 
We can characterize when d(A, B) is zero: 


Proposition 2.38 Let A, B be multiplicative sets in an ambient group G. Then 
d(A, B) = 0 if and only if A and B are both left cosets of the same finite subgroup 
H, thus A=x-H andB=y-H for some x,y E G. 


We leave the proof as an exercise. Observe that d (A, B) = 0 does not necessarily 
imply that A or B has small doubling; if x or y lie outside the normalizer of H then 
A? or B? can be significantly larger than A or B . Similarly we see that d(A, B) = 0 
does not imply that d(A, B7!) = 0. So there does not appear to be an analogue of 
Corollary 2.12. However, with some care and a few new arguments, we can still 
obtain the analogues of the results from Sections 2.4 and 2.5. Let us start by the 
analogue of Ruzsa’s covering lemma, which can be proved by the same argument. 


94 2 Sum set estimates 


Lemma 2.39 Let A, B be multiplicative sets in an ambient group G such that 
|A - B| < K|A|. Then there exists a finite set X in B of cardinality at most K such 
that B C A!-A-X. 


From Section 2.4, we know that if A is a subset of a commutative group G and 
|A + A| < K|A|, then |jnA — mA| < O(K}?™+™]A]) for any n, m. This no longer 
holds in a non-commutative setting. Consider for instance A := H U {x} where 
H is a subgroup of G and x lies outside the normalizer N(H) of H. Then A - 
A=HU(x-H)U(H -x)U {x?}, so |A - A| < 3|A| — 2; but A - A - A contains 
H -x - H which can be as large as |H |? = (|A| — 1)”. Interestingly, it turns out 
that if we assume that |A - A - A| is small, then the problem disappears and we can 
otain the following analogue of Proposition 2.26. 


Proposition 2.40 Let A be a multiplicative set in a group G, and let K > 1. Then 
the following statements are equivalent up to constants, in the sense that if the jth 
property holds for some positive absolute constant C ;, then the kth property will 
also hold for some absolute constant C; depending on Cj: 


(i) |A- A- A| < KAI; 
(ii) We have |A“ --- A®| < K©"|A| for all n > 1 and all signs 
€j,...,€, E {-l, l}; 
(iii) there exists a K@-approximate group H containing A where |H| < K@|A|. 


Proof First we show that (i) implies (ii). Assuming (i), we have |A- A| < 
|A-A-A| < K“ JA]. It follows that d(A, A~!) (which equals d(A~!, A)) and 
d(A- A, A~') are O(log K). By the triangle inequality d(A - A, A) = O(log K), 
which implies |A - A- A~!| < K?|A| and d(A, A- A7!) = O(log K). Again 
by the triangle inequality, we have d(A - A~', AT!) = O(log K), which implies 
JA - A7! - A| < K?“|A|. By asimilar argument, we can show that|A~!- A - A| < 
KO®| A|. With these bounds (and taking inverse) we obtain the statment of (ii) 
for n = 3. From here, it is easy to finish the proof by induction on n, with n = 3 
being the base case. (For n = 2, the statement in (ii) is trivial.) 

Next, we prove that (ii) implies (iii). Set H’ = AU {1} U AT! and H =H’. 
H' - H'. Clearly H is symmetric and contains A. By (ii), |H| < K 0A]. It thus 
remains to show that H is a K °®- approximate group. Notice that |H’ - H - H| < 
K | A|. By the covering lemma, we have a set Y of cardinality K°® in H - H 
such that 


H-HcCH'!-H'-Y. 


Notice that the right-hand side is a subset of H - Y. Now set X = YUY7!. 
Since both H and X are symmetric H - H is contained in both H - X and X - H. 


2.7 Non-commutative analogues 95 


Moreover, as X C H - H, 
HA-XCH-H-HCX-H-HcCX.-X-H 


completing the proof. 
The remaining implications are straightforward and left as an exercise. 














Now we are going to prove we can still obtain (iii) under the assumption 
that d(A, B) = O(log K) for some set B. We will need the following variant of 
Lemma 2.13, whose proof we leave as an exercise. 


Lemma 2.41 Let A be a multiplicative set. Then there exists a symmetric set 
S C A`! . A such that |S| > |A|/2 and 
2”|A > Ay Ae p Al" 
|< 
=> |A|2” 





|A- S”. AT! 


for all integers n > 0. 
As d(A, A) < 2d(A, B), this implies 


Corollary 2.42 Let A be a multiplicative set such that d(A, B) < log K for some 
K > 1. Then there exists a symmetric set S such that |S| > Q(K~°|A]) and 


JAS". AT] < OEA) 
for all integers n > 0. 


Proposition 2.43 Let A, B be multiplicative sets in a group G, and let K > 1. 
Then the following statements are equivalent up to constants, in the sense that if 
the jth property holds for some absolute constant C j, then the kth property will 
also hold for some absolute constant C depending on Cj: 


(i) d(A, B) < Ci + log K); 

(ii) there exists a CyK ©-approximate group H such that |H| < C.K@|A\, 
ACX-HandB CY -H for some multiplicative sets X,Y of cardinality 
at most C.K. 


Proof We only need to prove that (i) implies (ii), as the reverse implication is 
trivial. Notice that (1) implies d(A, A) = O(log K). Thus, we have a symmetric 
set S of cardinality K°“|A| such that 


(As $? AT! SRM Al: 


This implies that |A - S| < K?|A| and thus d(A, S) = O(log K ). Furthermore, 
|S3| < K?|S| so we can find a O (K °“)-approximate group H of size K 2‘) A| 
containing S. This, in particular, implies that d(S, H7!) = O(log K). By the trian- 
gle inequality, d(A, H7!) = O(log K), which yields |A - H| < K°|A]. By the 


96 2 Sum set estimates 


covering lemma, there is a set Y of cardinality K° such that A CY-H-H7!. 
But as H is an approximate group, H7! = H and H-H C Z-H for some set 
Z of size K°™. Thus, A C (Y - Z) - H, where |Y - Z| < |Y||Z| = K°. The 
conclusion for B can be proved similarly. 














Let us now consider the non-commutative verstion of Balog—Szemerédi— 
Gowers theorem. Theorem 2.29 still holds when the ambient group Z is 
non-commutative. The proof of this theorem is purely graph-theoretical (see 
Section 6.4) and has little to do with the commutativity of the group. 


Theorem 2.44 (Balog—Szemerédi—Gowers theorem, non-commutative ver- 
sion) Let A, B be multiplicative sets in an ambient group Z, and let G C A x B 
be such that 


|G| > |AI|BI/K and |A° B| < K'|A|!? |B]? 


for some K > 1 and K' > 0. Then there exists subsets A' C A, B’ C B such that 


|A"| > ee (2.37) 
4./2K 

|B’| > na (2.38) 

|A’ - BY < 2P K*(K’P Al? Bll”. (2.39) 


In particular we have 
d(A’, B'"') < 5log K + 3log K’ + O(\). 


Define the multiplicative energy E(A, B) between two multiplicative sets A, B 
with common ambient group to be 


E(A, B):=|{(a,a,b,b'.)e AX AX B x B:ab=ab’}}\. (2.40) 


A significant difficulty here is that E(A, B) obeys far fewer symmetries in the 
non-commutative case than in the commutative case; indeed, the only symmetry 
available is that E(A, B) = E(B~!, A~!). However in the case when B = A~! we 
have a crucial additional identity E(A, A7!) = E(A7!, A) (see exercises), which 
can be thought of as a very weak, restricted form of commutativity. 

The following variant of Lemma 2.30 holds, with basically the same proof. 


Lemma 2.45 Let A, B be multiplicative sets in an ambient group Z, and let G be 
a non-empty subset of A x B.Then 


IG? 


E(A, B) > ———. 
|A - B| 


2.7 Non-commutative analogues 97 


Conversely, if E(A, B) > |A|>/?|B)3/2/K for some K > 1, then there exists 
G C A x B such that 


IG| > |AIIB|/2K; JA © B| < 2K]A|"?|B|!}. 
Finally, notice that by the triangle inequality 
d(A', A’) < d(A', B™') + d(B™', A’) = 2d(A', B™'), 


which means that if d(A’, B’“') is small, then d(A’, A’) is also small. From here, 
we can use the same arguments for the commutative case to deduce 


Corollary 2.46 Let A, B be multiplicative sets in an ambient group Z such that 
E(A, B) > |A|>/?|B|?/?/K for some K > 1. Then there exists a subset A' C A 
such that |A'| = Q(K~?| Al) and|A’ - (A~!| = O(K 2 Al) for some absolute 
constant C. 


Combining this with the identity E(A, A7!) = E(A™!, A) we obtain the fol- 
lowing weak commutativity property between A and A~!: 


Corollary 2.47 Let A be a multiplicative set such that |A - A| < K|A| for some 
K > 1. Then there exists a subset A’ C A such that |A'| = Q(K~°|A]) and 
|A": (A’“"| = O(KO|A)). 


It is now not too hard to obtain the following theorem. 


Theorem 2.48 Let A, B be multiplicative sets ina group G, and let K > 1. Then 
the following statements are equivalent up to constants, in the sense that if the jth 
property holds for some absolute constant C ;, then the kth property will also hold 
for some absolute constant C; depending on Cj: 


(i) E(A, B) > Cy KAP? (BP ?; 
(ii) there exists a subset G C A - B with |G| > Cy'K-@|A||B| such that 
|A © B| <CoK@ IAI! BI; 
(iii) there exists a C3K © -approximate group H and x, y € G such that 
|H| <C3K@|A|'/?|B|'/? and 


IAN(- A), |BN(H-y)| > Cy'K-G lA. 


We leave the proofs of these statements to the exercises. Despite these char- 
acterizations, there is much left to be done in the study of product sets in non- 
commutative groups. For instance we do not currently have a satisfactory version 
of Freiman’s theorem in general. However there has been some progress in the case 
of very small doubling [172] and also in certain special groups such as $L(Z) or 
free groups; see for instance [78], [182]. 


98 


2 Sum set estimates 


Exercises 

2.7.1 Prove a multiplicative version of Lemma 2.1. 

2.7.2 Prove a multiplicative version of Lemma 2.6. 

2.7.3 Prove Proposition 2.38. 

2.7.4 Let (A, G) be a multiplicative set. Prove that |A - A| = |A| if and only if 
A is anormal coset of H, i.e. A = x - H = H - x for some x € N(H). 

2.7.5 Let A be a symmetric multiplicative set, so A = A7!, and let o [A] 
denote the n-fold doubling numbers |A”|/|A|. Using the Ruzsa triangle 
inequality, show that om+n-2[4A] < Om[A]on[A] for all m, n > 2. 

2.7.6 Let A and B be multiplicative sets. Establish the identities 
E(A, B) = E(B7!, A`!) and E(A, A7!) = E(A7!, A), and the inequal- 
ity (A,B) > ARBE, 

2.7.7 Let A, B, C be additive sets in an ambient group Z, let0 < £ < 1/4, and 
let G C A x B7!, H C B x C7! be such that |G| > (1 — «)|A||B| and 
|H| > (1 — £)|B||C|. By modifying the solution of Exercise 2.5.4, show 
that there exists subsets A’ C A and C’ CC with |A’| > (1—«!/?)|A| 
and |C’| > (1 — e!/7)|C| such that |A’ - (C’)“!| < Eee ee 

2.7.8 Let A be a multiplicative set such that |A - A~'| < K|A| and|A7!- A| < 
K|A|. Show that there exists a subset A of A such that |A| > |A|/2K 
and 

[AAs ACR ee Oma oil 
for all n > 2, where the product consists of n factors alternating between 
A and A“!, 

2.7.9 If A and B are multiplicative sets in a group G, show that there exist sets 
X,, X2 C A such that |X| < va, |X| < wa, and A C X,-B- Bo! 
and A C B`! . B - X3, by modifying the proof of Lemma 2.14. 

2.7.10 Prove Lemma 2.41. 

2.7.11 Show that the direct analogue of Proposition 2.18 fails in the non- 
commutative case, even when A = B = A™!. 

2.7.12 Let A, B be multiplicative sets in an ambient group G, and let A be the 


set 





A a y 1 U dane chp hp |A||B|? 
Ã:=lacA:|{(a, b,b) € AxB xB :a= dbb! > , 


2|A- B| 
Establish the bounds 
|A]? 


|A| > 
2JA - Bl 





2.8 Elementary sum-product estimates 99 


and 
Ver ey ore mee 
= las 


Compare this against Exercise 2.7.11. Hint: if x := aia; "azaz" be a typ- 
ical element of A - A~! - A- A~!, obtain at least (te 
of the form 





)° representations 


x = [abb a) ‘a4 |b5[aabo] | 


where aıb2, a4b2 € A- B, b}, b} € B, and (a})"'a, € AT! - A. 
2.7.13 Prove Theorem 2.48. 


2.8 Elementary sum-product estimates 


We now discuss some results concerning the sum set and product set of a subset 
A of a commutative ring Z, thus combining both the additive and multiplicative 
theory of the preceding sections (but keeping the multiplication commutative, 
for simplicity). The question here is to analyze the extent to which a set A can 
be approximately closed under addition and multiplication simultaneously. Of 
course, one way that this can happen is if A is a subring of Z; it appears that up to 
trivial changes (such as removing some elements, adding a small number of new 
elements, or dilating the set), this is essentially the only such example, although 
we currently only have a satisfactory and complete formalization of this principle 
when Z is a field (Theorem 2.55). In some ways the theory here is in fact easier than 
the sum set theory, because one can exploit two rather different structures arising 
from the smallness of A + A and the smallness of A - A to obtain a conclusion. 
As in the rest of this chapter, our discussion is for general fields, with a particular 
emphasis on the finite field Z,,. We remark that for the field R much better results 
are known, see Sections 8.3, 8.5. 

In this section Z will always denote a commutative ring, and Z* will denote the 
elements of Z which are not zero-divisors; these form a multiplicative cancella- 
tive commutative monoid in Z. The situation is significantly better understood in 
the case that Z is a field (see in particular Theorem 2.55 below); in such cases 
we shall emphasize this by writing the field as F instead of Z, and F* instead of 
F* = F\{0} to emphasize that F* is now a multiplicative group. A fundamen- 
tal concept in the field setting is that of a quotient set, which is the arithmetic 
equivalent of the concept of a quotient field of a division ring. 


100 2 Sum set estimates 


Definition 2.49 (Quotient set) Let A be a finite subset of a field F such that 
|A| > 2. Then the quotient set Q[A] of A is defined to be 


A-A ; a—b 
~ (A—A)\0" |[c-d 
We also set Q[A]* := Q[A]\0 to be the invertible elements in Q[A]. 





Q[A]: abode acxal. 


Observe that Q[A] contains both 0 and 1, and is symmetric under both additive 
and multiplicative inversion, thus Q[A] = —Q[A] and Q[A]* = (Q[A]*)7!. It 
is also invariant under translations and dilations of A, thus Q[A] = Q[A+ x] = 
Q[A - A] forallx € F and à € F*. Geometrically, Q[A] can be viewed as the set 
of slopes of lines connecting points in A x A. 

The relevance of the quotient set to sum-product estimates lies in the trivial but 
fundamental observation: 


Lemma 2.50 Let A be a finite subset of a field F such that |A| > 2,andletx € F. 
Then |A +x - A| = |A|? ifand only if x ¢ O[A]. 


Proof We have |A +x - A| = |A|? if and only if the map (a, b) > a + xb is 
injective on A x A, which is true if and only if a + xb Æ c + xd for all distinct 
(a, b), (c,d) € A x A, which after some algebra is equivalent to asserting that 


x € Q[A]. 


This has an immediate corollary: 














Corollary 2.51 If A is a subset of a finite field F such that |A| > |F|'/?, then 
Q[A] = F. 


Note that the condition |A| > |F|!/? is absolutely sharp, as can be seen by 
considering the case when A is a subfield of F of index 2. 

Lemma 2.50 has another important consequence: it gives a criterion under 
which Q[A] is a subfield of F. 


Corollary 2.52 Let A be a finite subfield of a field F such that |A| > 2 and 
|A + Q[A] - QLA] - Al, |A + (Q[A] + QILAD) - A| < JA}. 
Then Q[A] is a subfield of F. 
This corollary may be compared with Exercise 2.6.5. 


Proof From Lemma 2.50 and the hypotheses we see that Q[A] - Q[A] € Q[A] 
and Q[A]+ Q[A] € Q[A]. In particular Q[A]* - Q[A]* = Q[A]*. Since Q[A] 
is finite and contains 0, 1, we see from Proposition 2.7 that Q[A] is an additive 
group, and similarly from the multiplicative version of this Proposition we see that 
Q[A]* is a multiplicative group. The claim follows. 














2.8 Elementary sum-product estimates 101 


In order to use this corollary, one needs to control rational expressions of A such 
as A+ Q[A]- Q[A]- A. In analogy with sum set estimates such as Corollary 2.23, 
one might first expect that once |A + A| < K|A| and |A- A| < K|A|, then all 
polynomial or rational expressions of A are controlled in cardinality by C KĪ |A]. 
This however is not the case, even if one normalizes A to contain 0 and 1. To 
see this, consider A = G U {x} where G is a subfield of F and x ¢ G. Then 
one easily verifies |A + A], |A - A| < 2|A|but]|A- A + A - A| > (JA] — 1)’, since 
A-A+A-A contains G +x - G, which has size |G|? by Lemma 2.50. This 
example is similar to one appearing in the preceding section, and it is resolved in 
a similar way, namely by passing from A to a subset of A. 


Lemma 2.53 (Katz-Tao lemma) /199], [41] Let Z be a commutative ring, 
and let A C Z* be a finite non-empty subset such that |A + A| < K|A| and 
|A- A| < K|A| for some K > 1. Then there exists a subset A’ of A such that 
|A’| > |A|/2K — 1 and |A' - A' — A' - A'| = O(K9® A’). 


Note that this lemma works in arbitrary commutative rings, not just in fields. 
The requirement that none of the elements of A be zero-divisors is not serious in 
the case of a field, since one can simply remove the origin 0 from A if necessary, 
but is a non-trivial requirement in other commutative rings. 


Proof We use an argument from [41]. We may assume that A > 10K (for 
instance) since the claim is trivial otherwise. Consider the dilates {a - A: a € A} 
of A. Since a € Z*, a - A has the same cardinality as A. In particular we have 


> DE laa@) = AP. 
xeA-AacA 


Since |A - A| < K|A|, we may apply Cauchy—Schwarz and conclude 


2, 
a (x Lat) > |AP/K. 


xeA-A \acA 
We rearrange this as 
X Ia- AN- A)| = |AP/K. 
a,beA 
By the pigeonhole principle we can thus find a b € A such that 
Xa- AN (b- A) > |AP/K. 
acA 


Fix this b. Setting A’ to be the set of all a € A such that 


a» A)N(B- A)| > |Al/2K 


102 2 Sum set estimates 


we conclude that 


Yo Ka- AN- A)| > |A|? /2K 


acA' 


and hence |A’| > |A|/2K. By shrinking A’ by one if necessary we may assume 
b g A’. Now recall the Ruzsa distance d(A, B) := log aggre and observe 
that d(a- A,a - B) = d(A, B) whenever a is not a zero-divisor. Then d(A, A) < 
2d(A, —A) = 2log K, and hence 


d(a-A,a-A)=d(b-A,b-A)=d(A, A) < 2logK foralla € A’. 
Since (a - A) N (b - A) is a large subset ofa - A and b- A, one can compute 
d(a-A,a-ANb-A),d(b-A,a-ANb-A)=O(1+log Kk) 
and hence by the Ruzsa triangle inequality 
d(a-A,b-A)= O(1 + log K) foralla € A’. (2.41) 
Dilating this, we obtain 
d(aia - A, ban - A), d(bay- A, b? - A) = O(1 + log K) for all a1, az € A’ 
and hence by the Ruzsa triangle inequality 
d(aia - A, b? - A) = O(1 + log K) for all ay, a € A’. (2.42) 


To proceed further we need to “invert” elements in A. For any a € A let â := 
Taea\tay 2 € Z*. By dilating (2.41) (with a replaced by a3) by aiaz Į Jweayjas.b) U 
for a1, a2, a3 € A’, we obtain 


d(ayarb - A, daja243- A) = O(1 + log K) for all a1, a2, a3 € A’. 
Meanwhile, from dilating (2.42) we have 
d(a,axb- A, b*b- A) = O(1 + log K) for all a}, a2, a3 € A’. 
Applying the Ruzsa triangle inequality, we thus have 
d(a\a7a3 - A, a\a,a- A) = O(1 + log K) for all ay, a2, a3, af , a}, a} € A’ 
and hence 
|ayana3- A —a\asa- A| = O(K?)|AI. 
Therefore we have 


x-A—y-Al = O(K?)|A||A’- A’. A’?, 
y 


x, ye Al Al Ar 


2.8 Elementary sum-product estimates 103 


where A’ := {â : a € A’}. But since |A - A| < K|A| and |A’| > |A|/2K — 1, we 
see from the multiplicative version of sum set estimates (working in the formal 
multiplicative group generated by the cancellative commutative monoid Z*) that 
|A’. A’. A’| = O(K | AJ). We thus have 
dD bk A-y- Als OKA’). 
x, ye AA’ A’ 

We rewrite the left-hand side as 

XO Mf, y) da, b € A’ such that z = xa — yb}}. 

zez 
Write w := Tleea a, and observe that whenever a, a2, a3, a4 € A’, the number 
@(a1a2 — a3a4) has at least |A’/? representations of the form xa — yb withx, y € 
A’. A’. A’ anda, b € A’, with (x, y) distinct, thanks to the identity 

@(a\a2 — a3a4) = (ayazra)a — (a3aab)b. 


Thus 
lw-(A’. A! — A’- A)| = O(K PEJA‘) 











and the claim follows since w € Z*. 





A modification of the above argument also gives the following statement, which 
can be viewed as a variant of Corollary 2.23 for the sum-product setting; we leave 
the proof to Exercise 2.8.1. 


Lemma 2.54 [43] Let Z be a commutative ring, and let A C Z* be a finite non- 
empty set such that |A - A — A - A| < K|A|. Then we have |A* — A*| < K°|A| 
for allk > 1, where A‘ = A-...- A is the k-fold product set of A. 


We can now classify those finite subsets of fields with small additive doubling 
and multiplicative doubling constant, up to polynomial losses: 


Theorem 2.55 (Freiman theorem for sum-products) Let A be a finite non- 

empty subset of a field F , and let K > 1. Then the following statements are equiv- 

alent up to constants, in the sense that if the jth property holds for some absolute 

constant C ;, then the kth property will also hold for some absolute constant Cp 

depending on Cj: 

(i) |A + A| <C,K“|A| and |A - A| < C)K“|A|; 

(ii) either |A| < C2K®, or else there exists a subfield G of F , a non-zero 
element x € F, anda set X in F such that |G| < C2K®?|A], |X| < CoK@, 
andACx-GUX. 


This is a slight strengthening of a result in [43], [44]. 


104 2 Sum set estimates 


Proof We shall only show the forward implication, leaving the easy backward 
implication to Exercise 2.8.2. By relabeling CıK®' as K, we may thus assume 
that |A + A| < K|A| and |A - A| < K|A|. We may assume that |A| > CoK“ for 
some large absolute constant Co, since the claim is trivial otherwise. We may also 
remove 0 from A without any difficulty, thus we may assume A C F*. Applying 
Lemma 2.53 and Lemma 2.54, we may find a subset A’ of A with |A’| = 
Q(K~?|A]) and |(A’)k — (A)*| = O(K)?|A"| for all k > 1. By Corollary 2.23 
this implies that 


In(A’yk — m(A'*| < O(K)% nm") A for all n, k,m > 1. (2.43) 


Dilating A with a non-zero factor if necessary, we may assume | € A’ (noting that 
the hypothesis and conclusion of the theorem are invariant under such dilations). 
We may now add 0 back to A’ and A without affecting (2.43). 

Now we apply Corollary 2.52. Let D := (A’ — A’)\{0} and G := Q[A’] = 
(A’ — A’)/D. Using lowest common denominators, we observe that 
(A'.D-D—(A'—A’)-(A'—A)- A) _ (4AA — 4(A49%) 

D? S D? l 
on the other hand, from (2.43) we have 





A'’+G-G-A'T 


\(4(A'? — 4A?) - D?| = OKOPA), 
so by the multiplicative version of Corollary 2.12 we see that 
|A'+G-G-A'] = O(K? A) < JA’? 


if Co is sufficiently large. A similar argument gives |A’+(G+G)-A‘| = 
O(K 2) A’|) < |A’|?. Applying Corollary 2.52 we see that G is in fact a field. 

Now let x be a non-zero element of A’, and let y be an element of A’. Then 
(a — y)/x € O[A’] = G for alla € A’, thus 


A’Cx-GHy. 
Thus 
POLE AA CSR LENCO 
and hence 
@ -G +y’ E (A +A- OFA)? 


But an argument using (2.43) and Corollary 2.12 as before gives |(A’ + A’ - 
O[A])?| = O(K °|A’|) < O(K ? |G). Direct computation shows that |(x - 
G + y)| > |G|? unless y € x - G. Thus (if Co is sufficiently large) we can take 
y € x - G. Because A’ contains 1, we thus have A’ C G. 


2.8 Elementary sum-product estimates 105 


Since |A + A’| < K|A| = O(K°|A’|), we may apply Ruzsa’s covering 
lemma (Lemma 2.14) and cover A by O(K ?“”) translates of A’ — A’, and hence 
by O(K °“) translates of G. A similar argument using the multiplicative ver- 
sion of this lemma (and temporarily removing the non-invertible 0 element from 
A if necessary) covers A by O(K C) dilates of G. On the other hand, we have 
(G - x) N (G + y)| < 1 whenever x ¢ 1. Thus we have |A\G| = O(K 2“), and 
the claim follows. 














This theorem implies that at least one of A + A or A - A is large if A does not 
intersect with a subfield of F: 


Corollary 2.56 (Sum-product estimate) /43],[44] Let A be a finite non-empty 
subset of a field F , and suppose that K > 1 is such that there is no finite subfield G 
of F of cardinality |G| < K |A| and no x € F such that |A\(x -G)| < K . Then we 
have either |A| = O(K°®) or |A + A| + |A - A| = Q(K°|A]) for some absolute 
constant c > 0. 


Remark 2.57 In the particular case when F has no finite subfields we thus obtain 
|A+ A| + |A- A| = Q(A|!**) for some absolute constant £ > 0; this result was 
first obtained (when F = R) by Erdős and Szemerédi [91]. In the setting of the 
real line it is was in fact conjectured in [91] that one can take € arbitrarily close to 
1 in the above estimate. For the most recent value of £, see Theorem 8.15. 


In the particular case of the field F = F, of prime order, which has no subfields 
other than {1} and F,,, one obtains 


Corollary 2.58 (Sum-product estimate for F,) [43],[44] Let A be a non-empty 
subset of Fp. Then 


|A + A| + |A- A| = Q(min(A], |F |/IAD IA) 
for some absolute constant c > 0. 


If H is any non-empty subset of F,, then we have kH* +kH*, kH* - kH* C 
k? HË forall k > 2. Thus we have 


IK’ H® | = Q(min(|kH*], p/IKH* F IkH* N) 


for some absolute constant c > 0. We can iterate this estimate (starting with k = 2 
and squaring repeatedly) to establish 


Corollary 2.59 Let H be any non-empty subset of Fy, and let A, 8 > 0. Then 
there exists an integer k = k(A, ô) > 1 such that 


|kH*| = Qa s(min(\H|4, p'~)). 


106 2 Sum set estimates 


We leave the proof of this corollary as an exercise. By using Lemma 4.10 from 
Chapter 4 one can in fact set ô = 0 here, though we will not need this fact here. 

In the special case when H is a multiplicative subgroup of F,,, we have H k=H, 
and hence Corollary 2.59 gives 


|KH| = Q4, s(min(|H|^, p'~)). 


Thus multiplicative subgroups have rather rapid additive expansion. It turns out 
that one can do something similar for approximate groups: 


Theorem 2.60 /40] Let H be a non-empty subset of F, such that |H?| < K|H|, 
and let A, 5 > 0. Then there exists an integer k = k(A, 5) > 1 such that 


[KH | = Q43(K~°4* min((H |4, p'~)). 


This result can be deduced from Corollary 2.59 and the following proposition; 
we leave the precise deduction as an exercise. 


Proposition 2.61 Let F be an arbitrary field, and let H C F* be a finite 
non-empty subset of invertible field elements such that |H?| < K|H| for some 
K > 1. Let k > 1 and L > 1 be such that kH obeys the following “additive 
non-expansion” property: we have |2kH| < L|kH"| for any subset H” of H 
of cardinality |H"| > -|H |. Then there exists a subset H' of H of cardinality 
|H'| > s¢|H| such that 


LHI = 0)( + log |HP K°M LOM |kH]) 
forall j = 1. 


Proof From the multiplicative version of Exercise 2.3.24 we can find H’ C H 
with |H’| > 3¢|H| and ho € H’ such that |(h - H)N (ho: H)| > x¢/H| for all 
h € H'. By dilation we may normalize hp = 1. From the additive non-expansion 
property we conclude that 


|2kH| < L|k((h - H)A H)| < LI\A;| forall h € H’, 
where Ay, := k(h - H) AO kH. Since 
|kH + Ar| < |2kH|; |k(h- H) + A| < |2k(h - H)| = |2kH| 
we thus obtain the Ruzsa distance estimates 
d(kH,—An),d(k(h - H), — An) < log L 
and hence by the triangle inequality 


d(kH, k(h- H)) < 2log L. (2.44) 


2.8 Elementary sum-product estimates 107 


Now we turn to controlling j(H’)/ for some j. We first observe that 
(H^? < |H?| < K|H| < 2K? |H"| 
and thus by the multiplicative analog of Exercise 2.3.10 we have 
(H^? (AY "| = O(K OOH"). 


We can then apply the multiplicative version of Exercise 1.1.8 to obtain a set 
X C (H’Y - (H'Y! of cardinality |X| = O(K (1 + log |H |)) such that (H^)? C 
X - H', and thus (H’)/ C X/~!- H’. Thus by the pigeonhole principle we can 
bound 


UYI iT) < IXPOOP | A ty H'| 
for some x1, ..., x; € X/~'; it thus suffices to show that 
xy H+- + xj- H'| = 0;(LOM KAI). 


Since xH’ is contained in a translate of k(xH’), we have the somewhat crude 
estimate 


jay H" + +++ + xj- H'| < |jB| 


where B := k(xı - H)U --- U k(x; - H). But the x; are all products of O (j) ele- 
ments from H’ and (H’)~!. From repeated application of (2.44) and the triangle 
inequality we conclude that 


d(k(x; - H), k(x - H)) < O(j log L) for all 1 <i, i’ < j 
and hence 
d(B, B) < O(j logL) + O(log j). 


From Exercise 2.3.10 we conclude that |jB| = O (LOB), and the claim 
follows. 














By combining Corollary 2.60 with the asymmetric Balog—Szemerédi—Gowers 
theorem, we can show that multiplicative subgroups of F, cannot have high addi- 
tive energy: 


Corollary 2.62 Let H be a multiplicative subgroup of F, such that |H | > p> for 
some 0 < 6 < 1. Then there exists an £ = (8) > 0, depending only on ô, such that 
E(A, H) < p~*|A\|H[? for all A C F, with 1 < |A| < p'°, if p is sufficiently 
large and depending on ô. 


108 2 Sum set estimates 


Proof Let e’ = «'(d) > 0 be a small number to be chosen later, and let € = 
é(e’, 5’) > 0 be an even smaller number to be chosen later. Suppose for con- 
tradiction that there existed a set A such that E(A, H) > p~*|A||H|?. Applying 
Corollary 2.36 (with L := p and e replaced by £’) we can find (if € is sufficiently 
small and depending on £’) a subset H’ of H with cardinality 


|H"| = Qe(p*?|H)) 
such that 
IKH'| < |A +kH"| = Ov (p ’IAI) 
for all k. Since H is a multiplicative subgroup, we see that 
|H’ - H'| <|H?| = |H | = Ov(p* 1H"). 


Since |H| > p®, we also see (if ¢’ is sufficiently small depending on ô) that |H |4 > 
p'—*/? for some A depending only on 5. We can thus apply Corollary 2.60 (with 
ô replaced by 5/2) and conclude that for a sufficiently large k depending on ô we 
have 


|kH’| — Qes (pi, 


This gives a contradiction if £’ is sufficiently small and depending on ô, and p is 
sufficiently large. 














We shall apply this to exponential sums over multiplicative subgroups; see 
Theorem 4.41. For a variant of this estimate, see Lemma 9.44. 

It seems of interest to obtain estimates of this type for more general commutative 
rings, and possibly even to non-commutative rings by combining these arguments 
with those in the preceding section. In this direction, Bourgain has established 


Theorem 2.63 [41] Let p be a large prime, and let A be a subset of the commuta- 
tive ring F, x Fp (endowed with the product structure (a, b) - (c, d) = (ac, bd)) be 
such that |A| > pê and |A + Al, |A - A| < p*|A| for some 6, £ > 0. Then there 
exists a set G of F, x Fp such that |G| < p®%©|A| and |A N G| > p~?|Al, 
where G is one of the following objects: 


e the whole space G = Fp x Fp; 

e a horizontal line G = F, x {a} for some a € Fp; 
e a vertical line G = {a} x Fp for some a € Fp; 

e aline G = {(x,ax) : x € Fp} for somea € FY. 


We sketch a proof of this proposition in the exercises. This is not as complete 
a characterization of sets with small sum-product as Theorem 2.55 — in particular, 
it does not address the case of very small A — but is already sufficient to control 


2.8 Elementary sum-product estimates 109 


a number of exponential sums of importance in number theory and cryptography. 
See [41], [40]. 

The problem of obtaining good sum-product estimates when the ambient com- 
mutative ring is the integers Z = Z has attracted a lot of interest. In this case it has 
been conjectured by Erdős and Szemerédi [91] that 


IkA| + |A*] = Qu (Al) (2.45) 


for all £ > 0, all k > 2 and all additive sets A C Z. Even the k = 2 case is open 
(and considered very difficult); this k = 2 case has currently been verified for all 
E> $, see Theorem 8.15. In another direction towards (2.45), a recent result of 
Bourgain and Chang [42] has shown that for every m > 1 there exists an integer 


k = k(m) > 1 such that 
|kA| + |A*| = Qn (1A |”) (2.46) 


for all additive sets A C Z. This last result is rather deep, in particular using an 
intricate “induction on scales” argument, coupled with some quantitative Freiman- 
type theorems. 


Exercises 


2.8.1 [41] Modify the proof of Lemma 2.53 to prove Lemma 2.54. (Hint: first 
use multiple applications of the triangle inequality to obtain control on 
|x- A—y-Al forall x, y € At. Â.) 

2.8.2 Prove the remaining implication in Theorem 2.55. 

2.8.3 Deduce Corollary 2.56 and Corollary 2.58 from Theorem 2.55. 

2.8.4 [44], [43] Let A, A’, B be non-empty subsets of a field F such that 0 ¢ B. 
Using the first moment method, show that there exists £ € B such that 


n JAPA? ; 
E(A,Ẹ-A') < —ipp AAI 
and conclude from (2.8) that 
~ JAIA’ + [BI 


2.8.5 [44] Let A be a subset of a finite field F such that |A| > |F'|'/?. Show that 
(A —A)-A+(A—A)-A|> sup,ep |A+x-Al > l and then con- 
clude that 


F=(A—-A)-A+(A—-A)-A+(A-A)-A+(A-—A)-A. 


(Hints: the first inequality follows easily from Corollary 2.51. For the 
second inequality, use Exercise 2.8.4.) 


110 


2.8.6 


2.8.7 


2.8.8 


2.8.9 


2.8.10 


2.8.11 


2.8.12 


2.8.13 
2.8.14 


2 Sum set estimates 


(Croot, personal communication) Let A be a subset of a finite field F such 
that |A| > |F|!/* for some integer k > 2. Show that |Q[A]| > |F|!/“—); 
this clearly generalizes Corollary 2.51. (Hint: exploit the fact that the 
maps (a1, ..., ak) œ> x141 +--+ + xag fail to be injective for arbitrary 
Xi,- Xk EF.) 

[43] Let A be a subset of a field F such that |A| > |F |f for some € > 0. 
Show that there exists an integer k = k(¢) > 1 depending only on £ such 
that k(A*) — k(A*) = G for some subfield G of F. (Use Exercise 2.8.5 
or Lemma 4.10.) 

[41] Let F, be a field of prime order p and Z = F, x Fp. Let AC Z 
be such that |A N ({a} x F,)| = p° and |A N ({b} x Fp)| = p> for some 
0 <6 <1 anda,be F,. Show that for some k = k(5) > 0 we have 
k(A*) — k(A*) = Z. (Hint: use Exercise 2.8.7.) 

[41] Let Fp, Z, be as in Exercise 2.8.8, and let mı : Z > Fp, m2 : 
Z — F, be the coordinate projections. Suppose that A C Z is such that 
|r1(A)|, |712(A)| > p? for some 0 < 5 < 1 and such that at least one 
of 71, m2 is not injective. Show that for some k = k(5) > 0 we have 
k(A*) — k(A*) = Z. (Hint: by Exercise 2.8.8 it suffices to find some k’ 
such that k’ (A") —k’ (A") contains a large intersection with either a hor- 
izontal line or a vertical line.) 

[41] Let Fp, Z, 71, m2 be as in Exercises 2.8.8, 2.8.9. Suppose that A C Z 
is such that |7;(A)|, |772(A)| > p° for some 0 < ô < 1. Show that either 
A is contained in a line {(x, ax): x € Fp} for some a € Ps or else 
k(A*) — k(A*) = Z for some k = k(6) > 0. (Hint: by Exercise 2.8.7 one 
can reduce to the case where 7}(A) = 72(A) = Fp. Now divide into two 
cases depending on whether zr, or m2 is injective on 2A — 2A or not.) 
[41] Use Exercise 2.8.10 and Lemmas 2.53, 2.54 to deduce Theorem 2.63. 
(You will have to take a small amount of care concerning the zero-divisors 
{0} x Fp U Fp x {O}.) 

Let Z be a commutative ring, and A1, Az, A3, Aa be subsets of Z* 
such that |A;| = |A2| = |A3| = |A4| = N and |A; - A2 — A3 - A4| < 
KN.Show that |A; - A; — A; - Aj| < K°N forall j = 1, 2, 3, 4. This 
lemma allows one to extend several of the above results to the setting 
where the single set A is replaced by a number of sets of comparable 
cardinality. 

Prove Corollary 2.59. 

Use Corollary 2.59 and Proposition 2.61 to prove Theorem 2.60. (Hint: 
start with k equal to a large power of 2, and set L equal to a small 
power of ||. If the hypotheses of Proposition 2.61 are satisfied, then 
one can lower bound |kH| by |j(H’)/|, which can be controlled using 





2.8.15 


2.8.16 


2.8 Elementary sum-product estimates 111 


Corollary 2.59. If not, we can lower bound |2k H| by L|kH’| for some 
large subset H’ of H; now replace k by k/2 and H by H’ and argue 
as before. Continuing this process, one eventually obtains a good lower 
bound on |KH| or |2kH|, either by combining Proposition 2.61 with 
Corollary 2.59, or by accumulating enough powers of L.) 

[40] Prove the following variant of Corollary 2.62: for any ô > 0 
there exists € > 0 such that whenever H, A are subsets of F, with 
|H| > pê, |H -H| < p*|H|, and 1 < |A| < p!'~°, then E(A, H)= 
O;(p~*|A||H |”). In particular we have |A + H| = Q(p°|H |). 

[18] Let A be an additive set in F, such that |A| < p'~ for some 
ô > 0. Show that there exists an € > 0 depending on ê such that 
l{(a,b,c,d,e, fy € Aŝ :ab+c =de + f= Os s(|A|578). (Hint: use 
the Balog—Szemerédi—Gowers theorem in both the additive and multi- 
plicative forms, together with Corollary 2.58.) This estimate is used in 
[18] to show that iterations of the map X > X,-X2+ X3 on random 
variables in F, (where X1, X2, X3 are independent trials of X) converge 
in a certain sense to the uniform distribution, which has applications to 
random number generation. 


3 





Additive geometry 


In Chapter 2 we studied the elementary theory of sum sets A + B for general 
subsets A, B of an arbitrary additive group Z. In order to progress further with 
this theory, it is important first to understand an important subclass of such sets, 
namely those with a strong geometric and additive structure. Examples include 
(generalized) arithmetic progressions, convex sets, lattices, and finite subgroups. 
We will term the study of such sets (for want of a better name) additive geome- 
try; this includes in particular the classical convex geometry of Minkowski (also 
known as geometry of numbers). Our aim here is to classify these sets and to 
understand the relationship between their geometrical structure, their dimension 
(or rank), their size (or volume, or measure), and their behavior under addition or 
subtraction. Despite looking rather different at first glance, it will transpire that 
progressions, lattices, groups, and convex bodies are all related to each other, both 
in a rigorous sense and also on the level of heuristic analogy. For instance, pro- 
gressions and lattices play a similar role in arithmetic combinatorics that balls 
and subspaces play in the theory of normed vector spaces. In later sections, by 
combining methods of additive geometry, sum set estimates, Fourier analysis, and 
Freiman homomorphisms, we will be able to prove Freiman’s theorem, which 
shows that all sets with small doubling constant can be efficiently approximated 
by progressions and similarly structured sets. 

Closely related to all of these additive geometric sets are Bohr sets, which are in 
many ways the dual object to progressions, but we shall postpone the discussion of 
these sets (and their relationship with progressions) in Section 4.4, once we have 
introduced the Fourier transform. 


112 


3.1 Additive groups 113 


3.1 Additive groups 


We first review the theory of additive groups, which we introduced in Definition 0.1, 
obtaining in particular the classification theorem for finitely generated additive 
groups (Corollary 3.9). This is a fundamental result in additive group theory, but it 
will also motivate similar results concerning other additively structured sets such 
as progressions, Bohr sets, and the intersection of convex sets and lattices. 

Typical examples of additive groups include the integers Z, the reals R, the 
lattices Zł, the Euclidean spaces Rf, the torus groups R“/Z*“, and the cyclic 
groups Zy := Z/N - Z. Note that the direct sum Z @ Z’ of two additive groups is 
again an additive group. We now make an important distinction between torsion 
groups and torsion-free groups. 


Definition 3.1 (Torsion) If Z is an additive group and x € Z, we let ord(x) be the 
least integer n > 1 such that n - x = 0, or ord(x) = +00 if no such integer exists. 
We say that Z is a torsion group if ord(x) is finite for all x € Z, and we say that it 
is an r-torsion group for some r > 1 if ord(x) divides r for all x € Z. We say that 
Z is torsion-free if ord(x) = +00 for all x € Z. 


Examples 3.2 The groups Z, R, Zt, Rf are torsion-free, whereas any finite group 
such as Zy is a torsion group. 


A group homomorphism @ : Z — Z' between two additive groups Z, Z’ is any 
map which preserves addition, negation, and zero (thus ¢(x + y) = (x) + 0), 
o(—x) = —¢(x), and (0) = 0 for all x, y € Z). If @ is also invertible, then the 
inverse @~! is automatically a group homomorphism, and we say that ¢ is an group 
isomorphism, and Z and Z’ are group isomorphic. Since all of our notions here 
shall be defined in terms of the addition, negation, and zero operations, they will 
all be preserved by group isomorphism, and so we will treat group isomorphic 
groups to be essentially equivalent. Later on we shall develop a weaker notion of 
Freiman homomorphism and Freiman isomorphism which is more suitable for the 
study of “approximate groups” (sets that are “almost” closed under addition); see 
Section 5.3. 

If G is a subgroup of an additive group Z, then we can form the quotient group 


Z/G:={x+G:xeZ} 


formed by taking all the cosets of G; this is easily verified to be a group (though 
it is no longer a subgroup of Z). For instance, the cyclic group Zy = Z/(N - Z) 
is the quotient of the integers Z by the subgroup N - Z. Observe that the map 
x : Z— Z/G defined by z (x) := x + G is a surjective homomorphism. 


114 3 Additive geometry 


The sumset G + H and intersection GM H of two subgroups are still sub- 
groups. Indeed, the arbitrary intersection of a family of subgroups is still a 
subgroup. Hence, given any subset X of Z, we can define the span (X) of Z 
to be the smallest subgroup of Z which contains X; equivalently, (X) is the space 
of all finite Z-linear combinations of elements of X. Thus for instance if x € Z, 
then (x) is a group with cardinality ord(x). We say that an additive group Z is 
finitely generated if it can be written as the span Z = (X) of some finite set X. 
Clearly, every additive set X is contained in at least one finitely generated group, 
namely (X). Thus in the theory of additive sets one can usually reduce to the case 
when the ambient group Z is finitely generated (though it is sometimes convenient 
to work in some selected non-finitely generated additive groups, such as Q, R, or 
R2). In Corollary 3.9, we shall completely classify all finitely generated additive 
groups up to isomorphism. 


Letv = (v1, ..., vg) € Z4 denote ad-tuple of elements in Z. We can rewrite the 
span (v) := ({v1,..., vg}) of this d-tuple in the following manner. For any element 
n = (nı, ..., nq) € Z4 we define the dot product n - v in the usual manner as 


N-Vi=NVy+-+-+Ngdg. 


The map n +> n- vis then a homomorphism from Z’ to Z, and its image Z’ - v 
is precisely the span of v: 


(w) = Z4-v. 


The notion of a progression, introduced in Definition 0.2, is a truncated version of 
the concept of a span, in which the infinite lattice Z? is replaced instead by a box. 
Alternatively, one can think of lattices as infinite progressions. 


3.1.1 Lattices 


We now study a special type of additive group, namely the lattices in Euclidean 
space. 


Definition 3.3 (Lattices) A lattice T in R? is any additive subgroup of the 
Euclidean space R? which is discrete (i.e. every point in I is isolated). We define 
the rank k of T to be the dimension of the linear space spanned by the elements 
of I, thus 0 < k < d. If k = d, we say that T has full rank. If T’ is another lattice 
in R? which is contained in I’, we say that T” is a sub-lattice of T. 


Thus for instance Zf is a lattice of full rank in Rf. More generally, a typical 
example of a lattice of rank k is the set ZÝ . v, where v = (v1, ..., vg) is a collection 
of linearly independent vectors in R’ for some 0 < k < d. In fact, this is the 
only possible type of lattice, as we shall see in Lemma 3.4. We observe that if 


3.1 Additive groups 115 


T : R — R’ is an invertible linear transformation on Rf, and T is a lattice, then 
T (I) is also a lattice with the same rank as F. 

If T is a lattice, then the quotient space R/T is a smooth manifold with a 
natural Lebesgue (or Haar) measure induced from R¢. If T has full rank, it is easy 
to see that R/T is also compact, and thus has a volume mes(R?/T), which we 
refer to as the covolume of T. 

Next, we classify all lattices in Rf. Call a vector v in T irreducible if v/n g T 
for any integer n > 2. 


Lemma 3.4 (Fundamental theorem of lattices) [fT is a lattice in R? of rank k, 
then there exist linearly independent vectors vı, ..., vg in R’ such that? = ZF . v. 
In particular every lattice of rank k is finitely generated and is isomorphic (via an 
invertible linear transformation from the linear span of T to R!) to the standard 
lattice Z. Furthermore, if w is an irreducible vector in T, we may choose the 
above representation T = ZF - v so that vı = w. 


Proof We first observe that we may assume that the vectors in I span R4, else 
we could pass from R¢ to a smaller vector space and continue the argument. In 
other words, we may assume that the rank k of T is equal to d. We may also clearly 
assume that d > 1, since the d = 0 case is vacuously true. 

Observe that I contains at least one irreducible vector w, since one can start 
with any non-zero vector v in I and take w to be the smallest vector of the form v/n 
(such a vector must exist since I is discrete). Now let w be an irreducible vector. 
By the full rank assumption, we can find d linearly independent vectors v1, ..., va 
in T with vı = w, so in particular the volume |v; A --- A val of the parallelepiped 
spanned by v1, ..., va is strictly positive. Since I’ contains Zf . (vy, ..., va), We 
obtain an upper bound for the covolume: 


|v) A+++ A val > mes(R4/T). 


We now use the method of descent. If Z? - (v1, ..., vg) is equal to T then we are 
done. Otherwise, the half-open parallelepiped { ae tivi :0 < ti < 1} generated 
by the vectors v1, ..., vg, being a fundamental domain of Z’. (vi, ..., Ug), must 
contain a non-zero lattice point x in r. Write x = yi ı 4v; note that at least 
one of f2,...,f¢ must be non-zero otherwise we would have tw € I’ for some 
0 <t < 1, which (by the Euclidean algorithm) contradicts the irreducibility of w. 
By permuting the indices 1, ..., d if necessary we may assume that t4 > 0. We 
may also assume that ty < 1/2 since we could replace w by vı +---+ va — x 
otherwise. Then the volume |v; A--- A vg_; A x| is at most half that of |v, A 
--- A val, but is still non-zero. We thus replace vg by x and repeat the above 
argument. Because of our absolute lower bound on the volume of parallelepipeds, 
this argument must eventually terminate, at which point we have found the desired 


116 3 Additive geometry 


presentation for I. Note that this procedure will never alter v; and hence v; is 
equal to w as desired. 














Corollary 3.5 (Splitting lemma) Let I" be a lattice of rank k, and let v be an 
irreducible vector in T. Then there exists a sub-lattice T" of T of rank k — 1 such 
that T is the direct sum of Z - v and T’, i.e. T =Z-v+T’ and Z. v OT” = {0}. 


Proof Apply Lemma 3.4 with vı := v, and set I” := Z‘-! . (vy, ..., vg); the 
claim then follows from the linear independence of v1, ..., ux. 














Corollary 3.6 (Fundamental theorem of finitely-generated torsion-free 
additive groups) Let Z be a finitely generated torsion-free additive group. Then 
Z is isomorphic to Z! for some d > 0. 


Proof We shall use the homomorphism theorems (Exercise 3.1.1). Since 
Z is finitely generated, we may find elements vj,...,v, in Z such that 
Z” - (vj, ..., Un) = Z. Now let T be the set {n € Z” :n-(vj,..., Vn) = O}; then 
T is a sub-lattice of Z” and Z is isomorphic to Z"/T. In particular, Z"/T is 
torsion-free. We shall show that this implies that Z” / T is isomorphic to some Zf, 
as desired. We induce on n, the case n = 0 being trivial. If T = {0} we are done, so 
suppose I" contains a non-zero vector v € I’, which we may assume without loss 
of generality to be irreducible in I’. It is also irreducible in Z”, for if v = m - w for 
some w € Zf and m > 1, then w +T would be a non-zero element of Z” /T 
such that m-(w+T)=0+T, contradicting the torsion-free assumption. By 
Lemma 3.4 or Corollary 3.5, this implies that Z” /(Z - w) is isomorphic to Z"~!. 
Since Z"/T is isomorphic to (Z"/(Z- w))/(T/(Z - w)), the claim then follows 
from the induction hypothesis. 














3.1.2 Quotients of lattices 


Let G be a finitely generated additive group generated by d elements v),..., vg € 
G. If we write v := (v;,..., vg), and let T C Z? be the lattice I := {n € Zf : 
n - v = 0}, it is easy to see that G is isomorphic to the quotient Z? / T. Thus it is 
of interest to understand the quotient of two lattices. A basic tool for doing so is 


Theorem 3.7 (Smith normal form) Let T and T” be two lattices of full rank in 
R’ such that T" is a sub-lattice of T. Then there exist linearly independent vectors 
V1,..., Ug inT such that 


T = Zf -(v,,..., va) 
and 


r” mE Zz’ i (Nivi, ARG Nava), 


3.1 Additive groups 117 


where 1 < Nı <--- < N4 are positive integers such that N; divides Nj+, for all 
j=l,...,d-1. 


Note that by applying an invertible linear transformation one can set (v1, ..., Ug) 
equal to the standard basis (e1, . . . , eq), so that IT becomes just the standard lattice 
Z? , while T” is the sub-lattice of Z? of vectors whose jth coordinate is a multiple 
of N; for j =1,...,d. 


Proof We induce on d. For d = 0 the statement is vacuously true, so suppose 
d > 1 and the claim has already been proven for d — 1. Given any non-zero vector 
v € I, define the index of v to be the largest positive integer n such that v/n € T; 
note that the index is finite since F is discrete. Note that the index of v is n if and 
only if v = nw for some irreducible vector w in F. 

Since I’ has full rank, it contains non-zero vectors, each of which has an 
index. Let N; denote the minimum index of all such vectors. By the well-ordering 
principle, this index is attained, and thus there exists an irreducible vector vı € T 
such that Nv; € I’. 

Using Lemma 3.4, we may apply an invertible linear transformation to map T 
to Zf, in sucha way that vı is now equal to the standard basis vector e. Now let 
(nı, .. . , nq) be any vector in T’. Observe that nı, ..., ng are integers; furthermore, 
nı must be a multiple of N;, otherwise by subtracting a multiple of Ne; we could 
ensure that |nı| < N1, which contradicts the definition of N; as the minimal index 
of I’. Thus we may factorize IT’ = N|Z-e, +T”, where I” is some sub-lattice 
of Z‘—! (which we think of as the span of e2,..., ea). Note that if x € T”, then 
(N,, x) € T’, and hence (since (N1, x) must have index at least N1), x must be a 
multiple of N,. Thus T” actually lies in N; - Z4~', and we may therefore write 
I’ = N,(Z-e, + T”) for some sub-lattice L” of Z7-!. Note that T” must have 
rank d — 1 since I” has rank d. 

We now invoke the inductive hypothesis, and, by applying an invertible linear 
transformation to Z7—! if necessary, we may assume that 


r” = {(n2Mb, aie naMa) Ndreca Ng E Z} 


for some 1 < My <--- < Ma such that M; divides M;,; forall j = 2,...,d—1. 
The claim follows by setting N; := Nı M; for j = 2,...,d. 














We can now obtain the well-known classification of finite and finitely generated 
additive groups: 


Corollary 3.8 (Fundamental theorem of finite additive groups) Every finite 
additive group G is isomorphic to the direct sum of a finite number of cyclic 
groups Ly = Z/(N - Z). 


118 3 Additive geometry 


Proof Letgi,..., gq bea finite set of generators for G. Then the map @ : Z4 +> G 
defined by ¢(n) :=n- (g1,..., Za) iS a surjection, and thus G is isomorphic to 
Z4/¢—'(0), which is a subgroup of R¢/—!(0). The kernel #~!(0) is clearly a 
lattice of some rank 0 < k < d, and hence by Lemma 3.4 is generated by k linearly 
independent vectors v;,..., vg in Z“. Observe that we must have full rank k = d, 
otherwise Z¢ /¢~'(0) (and hence G) will be infinite. Using the Smith normal 
form, we can after applying an isomorphism write ¢~!(0) as the lattice generated 
by Nie1,..., Naeg for some integers Ni,..., Na => 1; this makes G isomorphic 
to G = Z/NiZ@.---® Z/N4aZ, as desired (indeed we even obtain a normal form 
in which N; divides Nj, for j = 1,...,d — 1). 














Corollary 3.9 (Fundamental theorem of finitely generated additive groups) 
Every finitely generated additive group G is isomorphic to the direct sum of a 
finite number of cyclic groups Z/(N - Z), and a lattice Z4 for some d > 0. 


Proof Let G:= {x € G : nx = 0 for some n > 0} be the torsion group of G; 
then by Corollary 3.8, G is the direct sum of cyclic groups. The quotient group 
G/G is torsion-free and is thus isomorphic to Z? for some d > 0 by Corollary 3.6. 
If we let é), ..., čą be arbitrary representatives in G of the standard basis e1, ..., eg 
of Z’, we thus see that G is the direct sum of G and Z-é),..., Z-&y, and the 
claim follows. 














Exercises 


3.1.1 (Homomorphism theorems) If @ : Z — Z’ is a homomorphism between 
groups, show that the range @(Z’) is a group which is isomorphic to 
the quotient group Z/¢~'(0). If G, H are subgroups of Z, show that 
(G + H)/G is isomorphic to H/(G N H). If furthermore G C H, show 
that H/G is a subgroup of Z/G and that (Z/G)/(H/G) is isomorphic 
to Z/H. If G’ is a subgroup of Z’, show that (Z ® Z’)/(G @ G’) is 
isomorphic to (Z/G) ® (Z'/G’). 

3.1.2 (Cauchy’s theorem) Show that if G is a subgroup of a finite additive 
group Z, then |Z/G| = |Z|/|G| (and in particular |G| must divide |Z)). 
By considering the groups (x) for various x € Z, conclude that every finite 
additive group Z is an |Z|-torsion group; in particular, ord(x) divides |Z| 


for all x € Z. 
3.1.3 Show that if x is any element of a additive group Z, then the group (x) = 
Z-x has cardinality ord(x). More generally, if v = (v1, ..., vg) € Zí, 


show that the group Z’ . v has cardinality at most ord(v1) - - - ord(vq), but 
at least as large as the least common multiple of ord(v;). 

3.1.4 Let Z be an additive group. Show that Z is an N-torsion group if and 
only if for every x € Z, the torsion of x is a divisor of N. Show that Z is 


3.2 Progressions 119 


torsion-free if and only if Z contains no finite subgroups other than the 
trivial subgroup {0}. 

3.1.5 Let Z = Z,; ® Z2 be a direct sum of additive groups andr > 1. Show that 
Z is torsion-free (resp. r-torsion) if and only if Z; and Z> are torsion-free 
(resp. r-torsion). 

3.1.6 Prove that Q and R are not finitely generated. 

3.1.7 If x, y are elements of an additive group Z with finite order, show that 
x + y also has finite order, and that ord(x + y) divides the least common 
multiple of ord(x) and ord(y). Conclude that the set tor(Z) := {x € Z : 
ord(x) < oo} is a torsion group; we refer to it as the torsion subgroup of 
Z. It is clearly the largest subgroup of Z which is a torsion group. Show 
that the quotient group Z/tor(Z) is torsion-free, and is in fact the largest 
quotient which is torsion-free (in the sense that all other torsion-free 
quotients are quotients of Z/tor(Z)). 

3.1.8 | Show that Corollary 3.5 fails whenever v is not irreducible. 


3.2 Progressions 


We now study a basic example of an additive set, namely that of a generalized 
arithmetic progression (or progression for short), as defined in Definition 0.2. 
These will be model examples of additive sets with large amounts of additive 
structure; they can be viewed as a hybrid between a lattice and a convex set. (For 
a more quantitative realization of this heuristic, see Lemma 3.36 below.) 

Note that progressions with the same set of basis vectors add very easily 


(a+ [0, N]-v)+(a’+[0, N']- v) = (a +a’) + [0, N + N']- v (3.1) 


(so in particular the rank and basis vectors do not change), whereas progressions 
with different basis vectors add via the formula 


(a+[0,N]-v)+(a+[0,N’]-v')=(at+a)+[0,N@N']-(V@v’). (3.2) 


Note the progression on the right-hand side of (3.2) is likely to be highly improper 
if v and v’ share some basis vectors in common. Also one can replace the box 
[0, N] by another one and also obtain a progression: 


a+[N,M]-v=(a+N-v)+[0,M—N]-v. 


Similarly if one uses boxes such as [N, M), etc. In particular, the negation of a 
progression is also a progression: 


—(a + [0, N] - v) = (~a) + [0, N]-(-v) = (-a—-N-v)+[0,N]-v. (3.3) 


120 3 Additive geometry 


From this and (3.2) we see that the sum or difference of two progressions is again 
a progression. Finally, we make the easy observation that the Cartesian product of 
two progressions is again a progression. 

We now show that, up to errors of O(1)4, that progressions of rank d are 
essentially closed under addition. 


Lemma 3.10 Let P = a + [0, N]-v be a progression of rank d in an additive 
group Z; we do not require that P be proper (see Definition 0.2). Then for any inte- 
gersn < m andanyb € Z, we can cover b + [nN,mN]-vby(m— n)! translates 
of P. In particular for any n,m > O with (n, m) 4 (0, 0), we can cover nP — mP 
by (n + m) translates of P, and in particular 


InP —mP| < (n + m)f|P]. 


Furthermore, nP — mP is also a progression of rank d and volume at most 
vol(nP — mP) < (n+m)vol(P). 


Proof The first claim is clear since 
In-N,m-N]-v= [(0,N]-v+ [(M@,...,1),(m,...,m)]- (Nivi, ..., Nava). 
From (3.1) we have 


nP — mP = (na — ma — mN .- v) + [0, (n+ m)N] -v 











from which the remaining claims follow. 





From this lemma we see in particular that if P is asymmetric progression of rank 
d and contains the origin (e.g. if P = [-N, N] - v), then P is a 24-approximate 
group in the sense of Definition 2.25. Indeed one can think of (symmetric) pro- 
gressions of small rank as substitutes for subgroups in torsion-free settings (since 
torsion-free groups cannot contain finite subgroups). They also are the arithmetic 
analogue of boxes (or more generally, parallelepipeds) in Euclidean space, and 
in fact many of the results from real-variable harmonic analysis regarding cover- 
ing by boxes (in physical space, Fourier space, or both) will have analogues for 
progressions. 

In the special case when the rank d is equal to 1, a generalized arithmetic 
progression is the same as an ordinary arithmetic progression (or arithmetic 
progression for short) 


P=a+[0,N]-v={a+tnv:0<n<N} 


with base point a € Z, basis vector or step v € Z, and length N + 1. Note again 
that the cardinality of P may be less than N + 1 if P is not proper, though in a 
torsion-free group this is only possible if the step v is zero. 


3.2 Progressions 121 


We record a trivial lemma that asserts that the sum set of a progression and a 
small set can be contained (somewhat inefficiently) in another progression. 


Lemma 3.11 /f P is a progression of rank d, and P + w,,..., P + wx are trans- 
lates of P, then all the translates P + w,,..., P + wx can be contained inside a 
single progression of rank d + K — 1 and volume 2*~'vol(P). 


Proof Write P = a + [0, N] - v. By translation invariance we may set wx = 
0. Then the claim follows by using the progression a + [0, N] - v + [0, 1]£~' - 
(W,..., WK-1). 














Thus if one adds a small number of elements to an progression, one can still place 
the combined set inside a progression of slightly larger rank and volume, although 
the volume can grow exponentially in |A|. This is unavoidable: see Exercise 3.2.2. 
Because of this exponential loss, it is sometimes better not to invoke this lemma, 
and deal with multiple shifts of a single progression rather than trying to contain 
everything inside a single progression. Note that we have not guaranteed that the 
progressions in Lemma 3.11 are proper; we will return to this point in Section 3.6. 


Exercises 


3.2.1 Let N = (Nj, ..., Na) be a collection of non-negative integers. Show that 
every proper ordinary arithmetic progression of length (N; + 1)--- (Na + 
1) is equal (as a set) to a proper generalized arithmetic progression of 
dimension N. (This example shows that the rank of a progression cannot 
be uniquely determined from the set of its elements, even if we restrict 
the progression to be proper.) 

3.2.2 Let K > landd > 0 be integers, and P = a + [0, N]-v be a rank d pro- 
gression in an additive group Z for some basis vectors v = (v1, ..., Va), 
and let X = {e;,..., eg} be a set of K elements in Z. Suppose that the 
elements v1, ..., Ud, €1,..., €g are linearly independent over Z. Show 
that any progression which contains P + X must necessarily have rank 
at least d + K — 1 and volume at least 2*—'vol(P), which shows that 
Lemma 3.11 is sharp. 

3.2.3 Show that ina torsion-free additive group, the intersection of two ordinary 
arithmetic progressions is again an ordinary arithmetic progression. What 
happens if the torsion-free hypothesis is removed? What happens if one 
or both of the progressions is allowed to have rank greater than one? 

3.2.4 Show that every finite additive group is also a proper progression. 

3.2.5 Let P be a progression of rank d. Show that P contains an arithmetic 
progression Q with |Q| > |P|!/¢, and furthermore that Q is proper if P 
is, and Q can be chosen to be symmetric around the origin if P is. 


122 3 Additive geometry 


3.2.6 Let P bea proper progression of rank d, and let A be a subset of P such 
that |A| < e|P| for some 0 < £ < 1. Show that P\A contains a proper 
progression Q of rank d with |Q| > C~4/e for some absolute constant 
C. 

3.2.7 Let A be an additive set in an ambient group Z, and let v € Z. Show 
that |(A + v)\A| < 1 if and only if A is equal to a proper arithmetic 
progression of step v, union a finite (possibly zero) number of translates 
of the group (v). In particular, if |A| < ord(v), then |(A + v)\A| > 0, and 
|(A + v)\A| = 1 if and only if A is a proper arithmetic progression of 
step v. 


3.3 Convex bodies 


We now review some of the theory of convex bodies in R, which are in some 
sense the continuous analogue of generalized arithmetic progressions. This is of 
course a vast field, and we shall restrict ourselves with just a small sample of 
results, relating to the additive theory of such sets, to covering lemmas, and the 
relationship between addition and volume. 

We shall use mes(A) to denote the volume of a set A in Rf; to avoid issues 
with measurability we shall mostly concern ourselves with bounded open sets A. 
If A € Rf and A € R, we use A - A to denote the dilation A- A := {Ax : x € A}. 
Observe that mes(AA) = |A|“mes(A). 

Recall that a set A in Rf is convex if we have (1 — 0)x + 0y € A whenever 
x,y E€ Aand0O < @ < 1; equivalently, a set is convex if and only if 


a-A+b-A=(atb)-A 


for all real a, b > O (Exercise 3.3.3). In particular we have nA = |n| - A for any 
integer A. We call A a convex body if it is convex, open, non-empty, and bounded. 
In particular we see that if A is a convex body, then 


mes(A + A) = mes(2- A) = 24mes(A), (3.4) 
so convex bodies have small doubling constant. As for A — A, we can use 


Lemma 3.12 [297] For any bounded open subsets A, B, C of R! (not necessarily 
convex), we have 


mes(A — C)mes(B) < mes(A — B)mes(B — C). 


3.3 Convex bodies 123 


This is proven by modifying the proof of Lemma 2.6 appropriately and is left 
as an exercise. From this Lemma (with A = C and B = — A) and (3.4) we obtain 


mes(A — A) < 4“mes(A); (3.5) 


compare these bounds with Lemma 3.10. For a slight refinement of (3.5), 
see Exercise 3.4.6. In the converse direction, the Brunn—Minkowski inequality 
(Theorem 3.16 below) will give mes(A — A) > 2¢mes(A). 

Call a convex body A symmetric if A = — A; thus for us symmetry will always 
be with respect to the origin. The following theorem of John essentially classifies 
all convex bodies (symmetric and non-symmetric) up to a (dimension-dependent) 
constant factor. 


Theorem 3.13 (John’s theorem) [194] Let A be a convex body in R°. Then there 
exists an invertible linear transformation T : R? — R4 on R4 and a point xọ € A 
such that 


Ba C T(A— xo) Cd: Ba, 


where Bq is the unit ball {(x,,...,%a) € R¢: Ke +--+ Xe < 1}. If A is symmet- 
ric, then we can improve these inclusions to 


Ba CT(A) C vd- Ba. 
The constants d and ~/d are sharp; see the exercises. 


Proof We will use a variational argument. Define an ellipsoid to be any set E of 
the form E = L(B4) + xo, where Bz is the unit ball, xọ € R, and L is a (possibly 
degenerate) linear transformation in R4; we allow the ellipsoid to be degenerate for 
compactness reasons. Since A is open and bounded, it is easy to see that the set of 
all ellipsoids E contained in A is a compact set (with respect to the usual topology 
on L and xo). Also the volume of the ellipsoid E is mes(E) = | det(L)|, whch is 
clearly a continuous function of £. Thus there exists an ellipsoid E = L(B4) + xo 
in A which maximizes the volume mes(E); since A is open, this volume is non- 
zero, and hence L is invertible. By applying L~! if necessary (observing that 
the conclusion of the lemma is invariant under invertible linear transformations) 
we may thus assume that E is a translate E = By + yo of the unit ball, where 
yo = L~! (xo). 

Let us now restrict to the case where A is symmetric. Observe that if A contains 
By + yo then it also contains By — yo by symmetry, and hence contains By, which 
is in the convex hull of By + yo and By — yo. To conclude the proof of the lemma in 
this case we need to show that A is contained in V/d - By. Suppose for contradiction 
that A was not contained in v/d - By; without loss of generality (and using the 
hypothesis that A is open) we may then suppose that re; € A for some r > Jd, 


124 3 Additive geometry 


where e; is the first basis vector. Observe now from elementary geometry that if 
w is any point on the boundary of By making an angle (w, e1) < arctan(r? — 1), 
then the line segment connecting w to re, is disjoint from (and not tangent to) B4, 
and, since B4 and re, both lie in the convex set A, we thus see that w also lies in 
the open set A. By symmetry, the same is true if Z(@, —e;) < arctan(/r? — 1). 

We now perturb the ball By by an epsilon. Now let ô > 0 be a small number, 
let £ > 0 be an even smaller one, and consider the ellipsoid Ls (Ba), where 


Le g(X1,-.., Xa) = (A+ (vd — 1 + d)e)x1, (1 — £)x2, ..., (1 — £)xa)). 


When ¢ = 0, Le (Ba) is just Ba. Now consider how Ls (Ba) evolves in ¢. The 
determinant of this transformation is (1 + (vd — 1 + ô)e)(1 — £)f7!, which has 
a positive £-derivative at £ = 0. Thus Ls, (Ba) has larger volume than B for suf- 
ficiently small £ (depending on 5). Now we check which points on the surface of 
Ls (Ba) expand away from the origin, and which ones contract. A simple compu- 
tation shows that for any w = (%1, . . . , wq) on the boundary of B4, the derivative 


ETO a 
where ||(y1,.--, yol := y +-+ ve is negative unless 
(d- 1+8 — 3 —---— o > 0, 
or in other words that 
Z(w, +e1) < arctan(/d — 1 + ô). 


But if 5 is small enough depending on r, this region is contained entirely within 





the interior of A by the previous discussion. Thus for ¢ small enough L,,5(Bz) is 
completely contained inside A but has larger volume, contradicting the maximality 
of Bz, and we are done. 

Now suppose that A is not symmetric. In this case we may translate so that 
yo = 0. Thus again we have By C A, and the task is to show that A Cn- Ba. 
Suppose again for contradiction that re; € A for some r > d; again this means 
that every point w in the boundary of By with Z(@, e1) < arctan(r? — 1) will lie 
in the interior of A. Now let ô, € > 0 and consider the ellipsoid 


Le 3(X1, Bis , Xa) + (d —1 + dee]; 


again, this ellipsoid has larger volume than B4 if £ is sufficiently small. Also, we 
see that 


d 
— |[Les(w) + (d — 1 + ô)ee ll? 
de e=0 


3.3 Convex bodies 125 


is negative unless 
(d—14+6)o} + d-—1+4)a, — of —----— 0 > 0, 


which can be rewritten (using ||œ|| = 1) as 
((d + da — Io + 1) > 0, 
or equivalently 
Z(@, ei) < arctan(y(d + 6)? — 1). 


We now argue as in the symmetric case to obtain again the desired contradiction, 
if ô is chosen so that d + ô <r. 














As acorollary of Theorem 3.13 we see that if A is a convex body, we can cover 
A+ Aor A — A by a relatively small number of copies of A: 


A + A can be covered by O(dy4 translates of A. (3.6) 





This follows immediately from the geometric observation that d - By +d- Ba = 
2d - By can be covered by O(d)‘ translates of B4. If A is symmetric, we can 
improve this somewhat. In the special case when A is a cube or a box, it is clear 
that A + A can be covered by 2? translates of A (cf. Lemma 3.10), but one cannot 
hope for this in general; for instance if A is a disk in R? then one needs six copies 
of A to cover A + A. In the general case, we will need the following continuous 
version of 








Lemma 3.14 (Ruzsa’s covering lemma) [300], [250] For any bounded subsets 
A, B of R? with positive measure (not necessarily convex), we can cover B by at 


; (A+B (A—B 
most min( me " A )) translates of A — A. 








The proof of this lemma is nearly identical to that of Lemma 2.14 and is left as 
an exercise. As a consequence we can improve (3.6) for symmetric convex bodies: 


Corollary 3.15 Let A C Rf be a convex body, and let à, u > 0 be real. Then À - A 
can be covered by at most (A. + 1)! translates of A — A, and À - A — n - A can be 
covered by (2max(A, u) + 1) translates of A — A. If A is symmetric, then... A 
can be also covered by (2. + 1) translates of A. 


Proof The first claim follows from Lemma 3.14 since mes(A- A + A) = 
(A + 1)4mes(A). To prove the second claim, we may take A > u. The first claim 
implies that 2A - A can be covered by (2A + 1)¢ translates of A — A = 2 - A, and 
the third claim follows by rescaling by 1/2. Finally, the second claim follows by 
applying the third claim to A — A. 














126 3 Additive geometry 


Observe that all the bounds obtained here tend to be exponential in d or worse. 
Thus when using the theory of convex bodies to obtain explicit estimates, it is often 
important to keep the dimension d as low as possible, even at the cost of making 
some other parameters larger than would otherwise be necessary. See [250] for 
further discussion of sum set and covering estimates for convex bodies. 

We have not yet seen what happens to the sum or difference of two unrelated 
convex bodies A and B. The relationship here is given by the Brunn—Minkowski 
inequality, which we turn to next. 


Exercises 


3.3.1 Prove Lemma 3.12. 

3.3.2 Prove Lemma 3.14. 

3.3.3 Verify that the two definitions of convexity given are indeed equivalent. 

3.3.4 Let A be an open bounded subset of R?. Show that A is convex if and 
only if 2A =2.- A, and that A is convex and symmetric if and only if 
2A = —2. A. 

3.3.5 For any s > 0 let T (s) := ra e™*x5T! dx denote the Gamma function. 
Show that F(s + 1) = sT (s) for all s > 0, that [(d) = (d — 1)! for all 
d > 1, that T(1/2) = ./z, and we have the Stirling formula 


logT (s) = s logs — s + O(log s) (3.7) 


for all large s. (Hint: use (1.52) and the monotonicity of the I function.) 

3.3.6 Let By be the unit ball in R°. By evaluating the integral fga eo? dx in 
both Cartesian and polar coordinates, and using the preceding exercise, 
establish the volume formula 


r(3/2)121 IDA 
B) = = = (2 1), 3.8 
mes(B4) Pd/2+1 (27e + o(1)) (3.8) 
3.3.7 Let Og be the octahedron given by the convex hull of +e;,..., +e 














in R¢. Show that mes(Oz) = 24 /d! = (2e + o(1))4d~¢. Thus in large 
dimension the octahedron becomes considerably smaller than the cir- 
cumscribing ball Bg which contains it, which in turn is considerably 
smaller than the circumscribing cube. 

3.3.8 Show that the constants d and v/d in Theorem 3.13 cannot be improved. 
(For the non-symmetric case, take A to be a d-simplex (the convex hull 
of d points in R®); for the symmetric case, take A to be a cube.) 

3.3.9 If A and A’ are two symmetric convex bodies in Rf, show that there exists 
an invertible linear transformation T : R —> R? such that 


ACT(A)) Cd-A. 


3.4 The Brunn—Minkowski inequality 127 


State and prove a similar result in the case when A and A’ are not neces- 
sarily symmetric. 
3.3.10 Let A, B be open bounded sets. Show that 
mes(A)mes(B) 
mes((A — A) (B — B)) > 
mes(A + B) 
for either choice of sign +, by developing a continuous analogue of the 
arguments used to prove (2.8). (Alternatively, one can try to discretize A 
and B to replace them with finite sets, and then use (2.8) directly.) 
3.3.11 [26] Let A be a symmetric convex body in Rf, which contains the ball 
p- B of radius p > O centered at the origin. Let V be any r-dimensional 
subspace of R?. Show that mes,(A N V) < A oor mes,(A), where mes, 
denotes r-dimensional measure. (Hint: first show that ifr < d, then there 
exists an r + |-dimensional space V; containing V such that mes,¥+ 


(AN Vi) => 22 mes,(A N V). Then continue inductively.) 














3.4 The Brunn—Minkowski inequality 


The purpose of this section is to prove the following lower bound for the volume 
mes(A + B) of a sum set. 


Theorem 3.16 (Brunn—Minkowski inequality) Jf A and B are non-empty 
bounded open subsets of RË, then 


mes(A + B)!’ > mes(A)!/4 + mes(B)!/4, 


This inequality is sharp (Exercise 3.4.2). The theorem also applies if A and B 
are merely measurable (as opposed to being bounded and open), though one must 
then also assume that A + B is measurable; we will not prove this here. In general, 
there is no upper bound for mes(A + B); consider for instance the case when A is 
the x-axis and B is the y-axis in R°, then A, B both have measure zero but A + B 
is all of R*. One can easily modify this example to show that there is no upper 
bound for mes(A + B) in terms of mes(A) and mes(B) when A, B are bounded 
open sets. See [128] for a thorough survey of the Brunn—Minkowski inequality 
and related topics. 

To prove this theorem, it suffices to prove the following dimension-independent 
version: 


Theorem 3.17 If A and B are non-empty bounded open subsets of R“, and 0 < 
0 < 1, then 


mes((1 — 0) - A +0 - B) > mes(A)' °mes(B)’. 


128 3 Additive geometry 


To see why Theorem 3.17 implies the Brunn—Minkowski inequality, apply The- 
orem 3.17 with A and B replaced by mes(A)~!/4. A and mes(B)~!/¢ - B to obtain 





1-6 yi Oe sa 
es . . 
mes(A)!/4 mes(B)!/4 z 
for any 0 < 6 < 1. Setting 
= mes(B)!/4 
~ mes(A)!/4 + mes(B)!/4 


we obtain the result. Conversely, one can easily deduce Theorem 3.17 from the 
Brunn—Minkowski inequality (Exercise 3.4.1). 
It remains to prove Theorem 3.17. We begin by first proving 





Lemma 3.18 (One-dimensional Brunn—Minkowski inequality) Jf A and B 
are non-empty bounded open subsets of R, thenmes(A + B) > mes(A) + mes(B). 


Proof The hypotheses and conclusion of this lemma are invariant under indepen- 
dent translations of A and B, so we can assume that sup(A) = 0 and inf(B) = 0, 
hence in particular A and B are disjoint. But then we see that A + B contains both 
A and B separately, and we are done. 














Using this Lemma, we deduce 


Proposition 3.19 (One-dimensional Prékopa—Leindler inequality) Let 0 < 
0 < 1, and let f, g,h : R — [0, 00) be lower semi-continuous, compactly sup- 
ported non-negative functions on R such that 


hA — @)x + Oy) => f(x)! %a(y)? 


for all x, y € R. Then we have 


f= (Ls) (Le) - 


Proof By multiplying f, g, h by appropriate positive constants we may normalize 


sup, f(x) = sup, f) = 1. 
Let 1 > A > 0 be arbitrary. Observe that if f(x) > A and g(y) > A, then by 
hypothesis h((1 — 0)x + 0y) > à. Thus we have 


{ze R:h(z) >A} E-o) {x ER: f(x) > A} +9 {fy ER: gly) > A}. 


Since f, g, are lower semi-continuous and compactly supported, all the sets 
above are open and bounded, hence by Lemma 3.18 


mes({z € R : A(z) > à} = (1 — @)mes({x E R: f(x) > å) 
+éOmes({y E€ R: g(y) > A}). 


3.4 The Brunn—Minkowski inequality 129 


Integrating this for à € [0, oo) and using Fubini’s theorem (cf. (1.6)), the claim 
follows from the arithmetic mean—geometric mean inequality. 














Now we iterate this to higher dimensions. 


Proposition 3.20 (Higher-dimensional Prékopa—Leindler inequality) Let 
0<0<1, d=>1, and let f,g,h: R’ > [0,00) be lower semi-continuous, 
compactly supported non-negative functions on R? such that 


hA — @)x + Oy) => f(x)! %a(y)? 


for all x, y € R¢. Then we have 


f= (La) (fe). 


Proof We induce on d. When d = | this is just Proposition 3.19. Now assume 
inductively that d > 1 and the claim has already been proven for all smaller dimen- 
sions d. Define the one-dimensional function hg : R > [0, oo) by 


ha(Xa) = h(x, ..., Xa) dx +++dXq-1, 
R2-! 


and similarly define fy, ga. One can easily check that (using Fatou’s lemma) 
that these functions are lower semi-continuous and compactly supported. Also, 
applying the inductive hypothesis at dimension d — 1 we see that 


hall — 0)xa + Oya) = faya)! gaya)? 


for all x4, yg € R. If we then apply the one-dimensional Prékopa—Leindler inequal- 
ity, we obtain the desired result. 














If we apply Proposition 3.20 with f := 14, g := 1g, and h := 1q_oyayon we 
obtain Theorem 3.17, and the Brunn—Minkowski inequality follows. 


Exercises 


3.4.1 Show that Theorem 3.16 implies Theorem 3.17. 

3.4.2 Show that equality in Theorem 3.17 can occur when A is convex, and 
B = à - A + xo for some A, x9 € R”. Conversely, if A and B are non- 
empty bounded open subsets of R¢, show that the preceding situation is 
in fact the only case in which equality can be attained. (The case when A 
and B are merely measurable is a bit trickier, and is of course only true 
up to sets of measure zero; see [128] for further discussion). 

3.4.3 Let A be a convex body in Rf. Using Theorem 3.17, show that 
the cross-sectional areas f (x4) := mes({x’ € R”! : (x', xg) € A}) area 


130 


3.4.4 


3.4.5 


3.4.6 


3.4.7 


3.4.8 


3.4.9 


3 Additive geometry 


log-concave function of xa, i.e. f((1 — A)xg + Aya) > fxg)! f Oa 
for allO < A < 1 and x4, yg € R; this is known as Brunn’s inequality. 
Let A be a bounded open set with smooth boundary ðA, and let B be 
a ball with the same volume as A. Prove the isoperimetric inequal- 
ity mes(9 A) > mes(dB). (Hint: Use the Brunn—Minkowski inequality 
to estimate mes(Ate Bromes) for € > 0 small, and then let € > 0.) 

Let A, B be symmetric convex bodies in R?. Show by examples that there 
is no upper bound for mes(A + B) in terms of mes(A), mes(B), amd d 
alone, except in the d = | case. However, by using Lemma 3.12, show 
that mes(A + B) < 46 

[282] Let A be a convex body. Use the Brunn’s inequality to show 
that mes(A N (x + A)) > (1 — r)"mes(A) whenever 0 < r < 1 and x € 
r - (A — A). Conclude that 


mes(A)} = Í mes(A N (x + A)) dx 
A—A 
1 
> í n(1 — r)"~!mes(A)mes(r - (A — A)) dr 
0 


= TaS AmA — A) 
whence one obtains the Rogers—Shepard inequality mes(A — A) < 
(*")mes(A). Show that this inequality is sharp when A is a simplex. Use 
Stirling’s formula to compare this inequality with (3.5). 
[162] Let A, B be additive sets in Z“. Use the Brunn—Minkowski inequal- 
ity to show that |A + B + {0, 1}¢| > 24 min(|A|, |B|). (Hint: consider 
A + [0, 1]? and B + [0, 1]?.) 
[162] Let A, B be additive sets in R¢. Show that |A + B + {0, 1}¢| > 
24 min(|A|, |B|). (Hint: partition Rf into cosets of Z“, locate the coset 
with the largest intersection with A or B, and apply the preceding 
exercise.) 
Let A be an open bounded set in R’. Show that mes(A + A) > 27mes(A), 
with equality if and only if A is convex. (Hint: A + A contains 2 - A.) 


3.5 Intersecting a convex set with a lattice 


In previous sections we have studied lattices, which are discrete but unbounded, and 


convex sets, which are bounded but continuous. We now study the intersection B N 


T of aconvex set B and a lattice I ina Euclidean space Rf, which is then necessarily 


3.5 Intersecting a convex set with a lattice 131 


a finite set. A model example of such set is the discrete box [0, N) for some 
N =(N,,..., Na), which is the intersection of the convex body {(x1,..., Xa): 
—1 < x; < N; forall 1 <i < d} with the Euclidean lattice Z’. One of the main 
objectives of this section shall to show a “discrete John’s lemma” which shows 
that all intersections B N T can be approximated in a certain sense by a discrete 
box. 

We begin with some elementary estimates. 


Lemma 3.21 Let T be a lattice in R“. If A C R! is an arbitrary bounded set and 
P CR‘ isa finite non-empty set, then 


[ANTP < (A-A)N( + P — P)I. (3.9) 
If B is a symmetric convex body, then 
(k - B) OT can be covered by (4k + 1) translates of B OT (3.10) 


forallk > 1. If furthermore T" is a sub-lattice of T of finite index |I /T'|, then we 
have 


IBAT’ < |BAT] < 9%Żr/ TIIB AT’. (3.11) 


Proof We first prove (3.9). We may of course assume that A N (T + P) contains at 
least one element a. But then AN (T + P) C (A — A) N(T + P — P)) + a, and 
the claim follows. Now we prove (3.10). The lower bound is trivial, so it suffices to 
prove the upper bound. By the preceding argument we can cover IG -B+x)AT]| 
by a translate of B NT for any x € Rf. But by Corollary 3.15 we can cover k - B 
by (4k + 1) translates of 5 - B, and the claim (3.10) follows. 

Finally, we prove (3.11). The lower bound is trivial. For the upper bound, 
observe that I is the union of |['/T’| translates of I, so it suffices to show that 
|B A (T + x)| < 97|B AT" for all x € R!. But by (3.9) and (3.10) we have 


IBAT +x) < 2- BAT] <9 | BN" 














as desired. 


Next, we recall a result of Gauss concerning the intersection of a large convex 
body with a lattice of full rank. 


Lemma 3.22 Let T C R¢ be a lattice of full rank, let v,,..., va € T be a set of 
generators for T, and let B be a convex body in R‘. Then for large R > 0, we have 
S mes(B) 

ICR - B)OT| = (R° + Or, ga RT D. 
[Jv A+++ Aval 

Here |v, A---Ava| denotes the volume of the parallelepiped with edges 
U1, ..., Ud. 


132 3 Additive geometry 


Proof We use a “volume-packing argument”. Since I has full rank, v1, ..., vg 
are linearly independent. By applying an invertible linear transformation we may 
assume that v;,..., Vg is just the standard basis e1, .. . , eg, so that T = Z1. Now 
let Q be the unit cube centered at the origin. Observe that the sets {x + Q :x € 
(R - B)N ZÍ} are disjoint up to sets of measure zero, and their union differs from 
R - B only in the /d-neighborhood of the surface of R - B, which has volume 
Org a(R?7!). The claim follows. 














Remark 3.23 The task of improving the error term Or, B.q(R*!) for various 
lattices and convex bodies (e.g. Gauss’ circle problem) is a deep and important 
problem in number theory and harmonic analysis, but we will not discuss this issue 
in this book; our only concern is that the error term is strictly lower order than the 
main term. 


If F is a lattice, we define a fundamental parallelepiped for T to be any 
parallelepiped whose edges v,,..., vg generate I. From the above lemma we 
conclude that all fundamental parallelepipeds have the same volume; indeed this 
volume is nothing more than the covolume mes(R¢/I’) of r. Thus for instance 
mes(R4/Z“) = 1. 

By another volume-packing argument we can establish 


mes(R4/T)|P/T’| = mes(R4/T"’) (3.12) 


whenever I’ C Ir cC R’ are two lattices of full rank; see the exercises. In particular 
we see that the quotient group |I / T'| is finite. 

Yet another volume-packing argument gives the following continuous and 
periodic analogue of (2.8). 


Lemma 3.24 (Volume-packing lemma) Let Il C Rf be a lattice of full rank, let 
V be a bounded open subset of R“, and let P be a finite non-empty set in R‘. 
Then 





W-var+P—py> SPI 
= mes(R2/T)’ 
In particular, we have 
mes(V) 
EVE IE ee ts 
mes(R¢/T) 


Proof Let B be the unit ball on Rf, and let R > 0 be a large number. Consider 
the integral of the function 


FO)= DY) D1 v4y+0). 


yerN(R-B) peP 


3.5 Intersecting a convex set with a lattice 133 


On the one hand we can compute this integral using Lemma 3.22 as 
f f(x) dx = 5 X mes(V +y) 
R? yer N(R-B) peP 
= |T N (R - B)||P ||mes(V)| 
o mes(B)mes(V) 
=(R4+0 RED P (eee 
(R? + Or,B,a( DIPI mes(R¢/T) 
On the other hand, from (3.9) we have 
fœ <le- VATP- Pls lV=— Vo +P- P). 
Furthermore, f(x) is only non-zero when x lies in R-B+V+PC(R+ 
Ov, p(1)) - B, which has volume RI + Ov p a(R®}!). Thus 


f f(x) dx < (V — V) N (T + P — P)|R? + Ov, pa(R“"). 
Ri 


Combining these inequalities, dividing by Rf, and taking limits as R —> œ, we 
obtain the result. 














To see the utility of this lemma, let us pause to establish the following classical 
result in number theory, which we will need later in this book. Let ||x||p/z denote 
the distance from x to the nearest integer. 


Corollary 3.25 (Kronecker approximation theorem) Let a),...,a@q be real 
numbers, and let 0 < 01, ..., 0a < 1/2. Then for any N > 0, we have 


{n € (—N, N) : ||nejllRyz < 9; for all j =1,...,d}| > NO,---6a. 


In particular, if N0, -- -04 < 1, then there exists an integer O < n < N such that 
Ina; \IR/z < 0; forall j =1,...,d. 


Proof Apply Lemma 3.24 with T := Zf, 
V :={(h,..-,ta) + Z4:0 < t; < 6; forall 1 < j <d}, 











and P equal to the arithmetic progression P = [0, N)- (œ1, ..., Œq) in R’. 





Even when B is symmetric, it is possible for |B N T | to be extremely large com- 
pared with TaD? consider for instance I := Z? and B := {(x, y) : —1/N? < 
x <1/N*;—N < y < N}. However, if B AT has full rank, then we can comple- 
ment the lower bound (3.14) with an upper bound: 


Lemma 3.26 Let I be a lattice of full rank in R“, and let B be a symmetric convex 
body in R? such that the vectors in B NT linearly span R¢. Then 


34d!mes(B) 
IBOATS 5 


134 3 Additive geometry 


This bound is with a factor of 34/(2d + 1) of being sharp, as can be seen by 
the example where T = Zf and B is (a slight enlargement of) the octahedron 
with vertices e,,...,-te,. Indeed this example motivates the volume-packing 
argument used in the proof. 














Proof By hypothesis, B N F contains a d-tuple (v1, ..., vg) of linearly indepen- 
dent vectors. Since BMT is finite, we can choose v4, ..., vg in order to min- 
imize the volume mes(O) = Z Jv A-+++A va| of the octahedron with vertices 
+v, ..., ug. Since B is symmetric and convex, we see that O C B. Also O 
does not contain any elements of I other than v),..., vg, since otherwise one 
could replace one of v1, ..., vg with this element and reduce the volume of O, a 
contradiction. Thus we see that the sets {x + 5 -0O :x € BOT} are all disjoint 
and are contained in B + 5 -OCT 3 - B. Thus 








mare Eea A 


es({ - O) ~ 24\v) A+++ A val 





mes(B). 











Since |v; A+++ A vg| = mes(R¢ / T), the claim follows. 





A special case of the volume-packing lemma gives 


Lemma 3.27 (Blichtfeld’s lemma) Let I C R? be a lattice of full rank, and let 
V be an open set in R’ such that mes(V) > mes(R¢ / T). Then there exists distinct 
x,y E€ V such that x — y EF. 


Now let us apply Lemma 3.24 to the case V = 5 - B and P = {0}, where B is 
a symmetric convex body; we obtain the lower bound 


Bori> mes(B) 
BATS 


mes(R4/ T)’ el) 


which is the classical Minkowski’s first theorem. The assumption of symmetry 
is essential. Consider for instance I := Z? and a convex set of the form B := 
{(x, y): 1/3 < x < 2/3;—N < y < N} for arbitrarily large N. 


Theorem 3.28 (Minkowski’s first theorem) LetT be a lattice of full rank, and 
let B be a symmetric convex body such that mes(B) > 2¢mes(R? /T). Then the 
closure of B must contain at least one non-zero element of T (in fact it contains 
at least two, by symmetry). If we have strict inequality, mes(B) > 24mes(R4“/T), 
then we can replace the closure of B with the interior of B in the above statement. 











Proof Apply (3.14) to (1 + €)B and let € go to zero. 





The constant in Minkowski’s first theorem is sharp. We may apply an invertible 
linear transformation to set I := Zf, and then the example of the cube A := 


3.5 Intersecting a convex set with a lattice 135 


{((q,...,ta): -1 < tj < 1 forall j =1,...,d} shows that the constant 2¢ cannot 
be improved. Nevertheless, it is possible to improve Minkowski’s first theorem by 
generalizing it to a “multiparameter” version as follows. 


Definition 3.29 (Successive minima) Let I be a lattice in R of rank k, and let 
B be a convex body in R’. We define the successive minima À j =A,(B,T) for 
1 < j < k of B with respect to T as 


A; := inf{A > 0 : à - B contains k linearly independent elements of I}. 
Note that O <A; < --- < Àk < ©. 
Thus, for instance, if T = Z4 and B is the box 
B i= {(ti octa): |t;| < a; forall j =1,...,d} 


for some a, > az >--- > aq > 0, then à; = 1/a; for j = 1, ...,d. Note that the 
assumption that I has rank k ensures that the A; are both finite and non-zero. 


Theorem 3.30 (Minkowski’s second theorem) LetT be a lattice of full rank in 
Rf, and let B be an symmetric convex body in R, with successive minima 0 < 
Ay < +++ < àa. Then there exists d linearly independent vectors vi, ..., vq ET 
with the following properties: 


e for each 1 < j < d, vj lies in the boundary of à; - B, but àj - B itself does not 
contain any vectors in I outside of the span of v,..., Vj—1; 





* the octahedron with vertices +v; contains no elements of V in its interior, 
other than the origin; 





° we have 
d ae se 
2°|T/(Z* - (v1, ..., va))I 2 dy Aames(B) F 2i. (3.15) 
d! mes(R4/T) 
in particular, the sub-lattice Zé . (vi, ..., va) ofT has bounded index: 
[P/(Z4 - (v1, ..., vg))| < dl. (3.16) 


One can state (3.15) rather crudely as 
Ay: +: Agmes(B) = d?mes(R4/T) 


thus relating the successive minima to the volume of the body B and the covolume 
of the lattice F. 

Note that if B contains no non-zero elements of I then A; > 1 for all j, so 
Minkowski’s second theorem implies Minkowski’s first theorem. Conversely, we 
shall see from the proof that Minkowski’s second theorem can be obtained from 
Minkowski’s first theorem by a non-isotropic dilation. The basis v1,..., vg is 


136 3 Additive geometry 


sometimes referred to as a directional basis for A with respect to I’, although one 
should caution that this basis does not quite generate I" (the index in (3.16) is 
bounded but not necessarily equal to 1). 


Proof By definition of 4;, we may find a vector vı € I such that v; lies in the 
closure of à; - B, but that A - B contains no non-zero elements of T for any A < i}. 
By definition of 42, we can then find a vector v2 € I’, linearly independent from 
vı, such that v2 lies in the closure of à2 B, but that A - B contains no elements of I 
outside of the span of v; for any A < A. Continuing inductively we can eventually 
find a linearly independent set vı, ..., vg in I such that v; lies in the boundary of 
A; + B, but 4; - A itself does not contain any vectors in F outside of the span of 
vj,..., Vj- forall 1 < j <n. 

The set vı, . . . , va is a basis of Rf; by applying an invertible linear transforma- 
tion we may assume it is the standard basis e4, . . . , eg (this changes both B and I, 
but one may easily verify that the conclusion of the theorem remains unchanged). 
In particular this forces I to contain Z“, hence by (3.12) 


mes(R4/T) = mes(R4/Z)/|P/Z4| = 1/|P'/Z4| < 1. (3.17) 














Let O% be the open octahedron whose vertices are +e;,..., eg. We need to 
verify that O? contains no lattice points from T other than the origin. Suppose 
for contradiction that O? NT contained w = tye; +--+ + tje; where 1 <j <d 
and t; 4 0. Then (1 + ¢)w would be a linear combination of +e),..., +e; for 
some £ > 0. All of these points lie in the closure of A; - B, hence w lies in the 
interior of A; - B, but does not lie in the span of e4, ... , e j—1. But this contradicts 
the construction of v; = e;. Hence O41 NT = {0}. 

Next, observe that +v; = +e; lies on the boundary of 4; - B foreach 1 < j < 
d. Thus B contains the open octahedron whose vertices are +e, /A,,..., eq/Aq. 
This octahedron is easily verified to have volume TE: 
the case when all the A; are equal to 1, and then one can decompose the octahedron 
into 2¢ simplices, each of which has volume 1/d!. This establishes the lower bound 
in (3.15). 

Now we establish the upper bound in (3.15). We need the following lemma. 
































indeed one can rescale to 


Lemma 3.31 (Squeezing lemma) Let K be a symmetric convex body in Rf, let 
A be an open subset of K, let V be a k-dimensional subspace of R“, and let 
0 < @ < 1. Then there exists an open subset A' of K such thatmes(A’) = 0*mes(A) 
and (A'— A)NV C@-(A—A)NV. 


Note that we do not assume any convexity on A or A’. Indeed the squeezing 
operation we define in the proof below does not preserve the convexity of A. 


3.5 Intersecting a convex set with a lattice 137 


Proof Without loss of generality we may take V = R*, and write R? = RÝ x 
R¢~, Let m : R > R~ be the orthogonal projection map, which restricts to a 
map zm: K — m(K). Let f : w(K) — K be any continuous right-inverse of 7; 
thus for instance f(y) could be the center of mass of 1 ~!(y). 

A point w € K can be written as w = (x, y), using the decomposition R? = 
RÝ x R=. Consider the map ® which maps w = (x, y)todw + (1 — 6) f(y) and 
set A’ = ®(A). Since both w and f(y) belong to K and K is convex, it follows 
that A’ is an open subset of K . Furthermore, the second coordinate of ®(w) is y as 
is that of f(y). By applying Cavalieri’s principle (or Fubini’s theorem) we see that 
mes(A’) = 6*mes(A) (the map contracts A by a factor 0 with respect to V = R5). 

Consider a point v = ®(w) — ®(w’), where w = (x, y), w’ = (x', y’) are 
points from A. If v € V, then the second coordinate of v is zero, which means 
y = y'. Then by the definition of ©, v = 6(w — w’). Thus v € 0 - (A — A), con- 
cluding the proof of Lemma 3.31. 














7 : ¿ : è wane, 
We apply the squeezing lemma iteratively, starting with Ao := > - B, to create 
open sets Aj,..., Ag-1 © Ao such that 


Gal 
mes(A;) = | —— } mes(A;_1) 


Aj+1 
and 
; Àj ; 
(Aj; -AOR C — (Aj — Aj-1) NR’ 
Ajy 
for all 1 < j < d — 1, where R' is the span of e1, ..., ej. In every application of 


the squeezing lemma, Ao plays the role of the mother set K. 
Using the definition of Ag, it is easy to check that 


mes(Ag_1) = A, °°: ha2~¢mes(B). (3.18) 


Furthermore, by induction one can show 


feet j 
(Ag-1 — Aa-1) A R’ C TN (Aj-1— Aj-1) OR’. 
On the other hand, Aj;_; C Ao = (Ag/2)- B. Since B is symmetric, ta -B— 
Ma -B=d,- B. It follows that 
(Ag-1 — Ag-1) NR! C àj- BNR! 


forall <j <d. 
By the definition of the successive minima, A; - B N R/ does not contain any 
lattice point in I’, except for those in R/—!. This implies that Ag_; — Ag_1 does 


138 3 Additive geometry 


not contain any point in I’ other than the origin. Applying Blichtfeld’s lemma, we 
conclude that 


mes(Aq_i) < mes(R“/T), 











which when combined with (3.18) gives the upper bound in (3.15). 





We now give several applications of this theorem. First we “‘factorize” a convex 
body B as the finitely overlapping sum of a subset of I and and a dilate of a small 
convex body B’, up to some scaling factors of O(d)°™: 


Lemma 3.32 Let B be a symmetric convex body in R, and letT be a lattice in R‘. 
Then there exists a symmetric convex body B’ C B such that B’ contains no non- 
zero elements of T, and such that B C O(d?!?) . B’ + (O(d?/?). B)AT. In par- 
ticular, the projection of B in R? /T is contained in the projection of O(d*!*) - B'. 
Furthermore, we have the bounds 


EB). eee) SOG) 
O@)42)BAT| ~ 


mes(B) 
IBAT 





(3.19) 


Proof By using John’s theorem and an invertible linear transformation we may 
assume that By C B C ~d - Ba, where By is the unit ball. We may assume that 
the vectors in B OT generate I’, since otherwise we could replace I" by the lattice 
generated by BOT. 

Let us temporarily assume that I has full rank, and thus that the linear span of 
BOT is R?@. Thus if we let 4; <--- < Ag be the successive minima of B, then 
we have 4; < 1 for all j. 

Now we take a directional basis v;,..., va of T, and let B’ be the open octa- 
hedron with vertices -v;; this octahedron then contains no non-zero elements of 
I’, and is also contained in B (since +v;/À; already lies on the boundary of B). 
Observe that d - B’ contains a parallelepiped with edges vı, ..., vg, and hence 
d - B' +T = R’. Thus 








BCd-B'+((B—d-B)NT)Cd- B’'+((d4+1)-B)NP) 
as desired (with about d'/* room to spare). In particular we have 
mes(B) < mes(d - B’)\(d + 1) - BNT| < (d(4d +5))‘mes(B)|BOT| 


thanks to (3.10); this proves the lower bound in (3.19) (with a factor of d?’ to 
spare). Conversely, the sets {x + 5 - B' : x € BOT }are disjoint (since B’ contains 
no non-zero elements of T) and contained in 2 - B, hence 


1 
|B AT |mes (5 . B’) < mes(2- B) 


3.5 Intersecting a convex set with a lattice 139 


which gives the upper bound in (3.19). This concludes the proof when T has full 
rank. 

Now suppose that I’ has rank r < d, then after a rotation we may assume 
that T is contained in R” x {0} c R” x R~”. The point is that the behavior in 
the d — r dimensions orthogonal to R” is rather trivial and can be easily dealt 
with as follows. Let B C R” be the intersection of B with R” x {0}, identify- 
ing R” x {0} with R” in the usual manner. Then by John’s theorem we have the 
inclusions 


Ee 5 
5 (B x Bir) SBC Vd- (B x Ba). 
Applying the previous arguments to B to obtain a set B’ C B, and then defining 


B' := 5 -(B’ x By_;), we can verify the claim in this case (losing some additional 
factors of d'/? and d“/*); we omit the details. 














In this theorem, we did not use the full strength of Minkowski’s second theorem 
(in particular we did not need the upper bound). The notion of a directional vector 
is, however, useful. 

As another consequence of Minkowski’s second theorem, we show how to find 
large proper progressions inside sets of the form B AP. 


Lemma 3.33 Let B be a convex symmetric body in R‘, and let T be a lattice in 
R‘. Then there exists a proper progression P in B AT of rank at most d such that 
IP] > O(a) "PIB ATI]. 


Proof Applying John’s theorem (Theorem 3.13) and (3.10) followed by a linear 
transformation, we may reduce to the case where B is the unit ball B = By in RI, 
provided that we also reduce the 7d /2 exponent to 3d. We may assume that B NT 
spans R, since otherwise we may restrict B to the linear span of B N T, which is 
then isomorphic to a Euclidean space of some lower dimension. In particular this 
means I has full rank, and that the successive minima 0 < 4; <--- <A, Of B 
with respect to I cannot exceed 1. Let v1, ..., vg E IT N B be the corresponding 
directional basis. Let Q denote the parallelepiped 


Q := {tivi +--+ tava : 0 < t; < 1/2 forall j € [1, d]}. 


By (3.16), Since each translate of Q — Q is a fundamental domain for Zf - 





(vi, ..., Va), it contains at most d! elements of . By Lemma 2.14, we can cover 
s(B+ 
B by at most mee re translates of Q — Q, and thus 
mes(B + 
|B| < qg Pes t 2) 


mes(Q) 


140 3 Additive geometry 


Since the v;,..., vg lie in the unit ball B, we see that Q C g - B and hence 
B+QC (g + 1)- B. Crudely bounding d! = O(d*), we thus conclude that 


|B NT | < O(d)**/mes(Q). 


From (3.15) we have 
Ay+++Aq < O(1)*mes(Z4/T) < O(1)“mes(Q) 
and thus 


[BOT | < O(a}! /M +++ Aa. 


The claim now follows by setting P :=[—N,N]-(v,..., va), where Nj := 
1/2d,; for j € [1, d]; note that one can easily verify that P is contained in BOT. 














We now give an alternative approach that gives results similar to Lemma 3.33. 
We first need a lemma to modify the directional basis given by Minkowski’s second 
theorem (which only spans a sub-lattice of I’, see (3.16)) into a genuine basis. 


Theorem 3.34 (Mahler’s theorem) Let I be a lattice of full rank in R!, and let 
B be an symmetric convex body in R4, with successive minima0 < à; < +++ < Ag. 
Let v,,..., vg be a directional basis for T. Then there exists a basis w,,..., Wg of 
T such that w; lies in the closure of i, - B, and w; lies in the closure of 2i - B for 
all2 < i < d. Furthermore, if V; is the linear span of vı, ..., vi, then wi, ..., Wi 
forms a basis for T A V;. 


The basis w1, ..., Wg is sometimes known as a Mahler basis for T. 


Proof We choose w; := v4; clearly w; forms a basis for I  V;. Now suppose 
inductively that 2 < i < d and w1, ..., w;—1 have already been chosen with the 
desired properties. The lattice I N V; has one higher rank than I N V;_; and hence 
there exists a vector w; in I N (V;\Vi—1) which, together with I N V;—1, generates 
T A V;; in particular, w1, ..., w; will generate I N V;. Since v4, ..., v; linearly 
span V;, we may write w; = tivi +---+4_10;_-1 + tiv; for some real numbers 
ty,...,¢; with t Æ 0. Since v; lies in I N V;_; + W, we must have t; = +1/n for 
some integer n. If |t;| = 1, then I AN V; is generated by I N V;_; and v;, and we 
can take w; := v;. Thus we may assume |t;| < 1/2. Also, by subtracting integer 
multiples of vı, ..., V;—1 from w; if necessary (which will not affect the fact that 
T A V; is generated by I N V;—ı and w;) we may assume that |t;| < 1/2 for all 
1 < j <i. But since each vj lies in the closure of 4; - B and hence A; - B, we 
conclude by convexity that w; lies in the closure of 2i - B, and so we can continue 
the iterative construction. Setting i = d we obtain the remaining claims in the 
theorem. 

















As an application we give 


3.5 Intersecting a convex set with a lattice 141 


Corollary 3.35 Let T be a lattice of full rank in R!. Then there exists linearly 
independent vectors w,,..., Wa E€ I which generate T, and such that 


mes(R4/T) = |w; A++- A wal > Qd Pwi] [wal (3.20) 


Proof Let w,,..., Wa be a Mahler basis for I with respect to the unit ball B, 
and let 4;,..., Aq be the successive minima. Then by Theorem 3.34 we have 


qth: 

l 

|wy|---|wa| <à] Ea : 
i=2 


Applying (3.15) we obtain 





tee < f R’/T). 
[wi] -+ [wal < nea /T) 
On the other hand, from (3.8) we have 
1(3/2)424 Taah 
B) = ——— = (2 14? da-4?, 
mes(B) a24 D (27e + o(1)) 





Crudely bounding d! = O(d®), the claim follows. 











As a consequence, we can give a “discrete John’s theorem” to characterize the 
intersection of a convex symmetric body with a lattice. 


Lemma 3.36 (Discrete John’s theorem) Let B be a convex symmetric body 
in R4, and let T be a lattice in Rf of rank r. Then there exists a r-tuple 
w = (u1,...,w,) € T” of linearly independent vectors in T and and a r-tuple 
N =(M,..., N,) of positive integers such that 


(7. B)NT S (-N,N)-w S BAT S(-r* N,N) w. 


Notice that the fact (—N, N)-w C BAF is similar to the conclusion of 
Lemma 3.33. However, the generalized arithmetic progression in Lemma 3.33 
has higher density. 


Proof We first observe, using John’s theorem and an invertible linear transforma- 
tion, that we may assume without loss of generality that Bz C B C d - Bz, where 
Bz is the unit ball in R¢. We may assume that I has full rank r = d, for if r < d 
then we may simply restrict B to the linear span of I’, which can then be identified 
with R”. We may assume d > 2 since the claim is easy otherwise. 

Now let w = (w1, ..., wg) be as in Lemma 3.35. For each j, let L; be the 
least integer greater than 1/d|w;|. Then from the triangle inequality we see that 
liwi +--+ +lawa| < 1 whenever |/;| < Lj, and so (—L, L) - w is contained in 
Bq and hence in B. 


142 3 Additive geometry 


Now let x € B AT. Since w generates l, we have x = liw +--+- + lawa for 











some integers l1, ..., l4; since B C d- Bg, we have |x| < d. Applying Cramer’s 
rule to solve for /,,..., lg and (3.20), we have 
eee Ix A wi - ++ wj- A Wj A Wal z |x|]wil --- [wal 
J Ga — 
[wy A= A wal [wlw Aw A wal 
|x| mes(R7/T) a 2d-d! 
|w jl T w; 


which is certainly at most d” L;. It follows that x € (—d™4L, d’! L) - w, which is 
what we wanted to prove. A more-or-less identical argument gives the inclusion 
(d . B)AT C (-L, L) - w. 














It would be of interest to see if the constant r?” could be significantly improved 
here, for instance to e?”) or even r?“), Progress on this issue may well have 
applications to improvements for Freiman’s theorem (see Chapter 5), which can 
be viewed as a variant of the above theorem in which the set B NT is replaced by 
a more general set of small doubling. 


Exercises 


3.5.1 Prove (3.12). 

3.5.2 Let @ be an irrational number, and let J be any open interval in R. Show 
that Z -œ and J + Z have non-empty intersection. (In other words, the 
integer multiples of a are dense in R/Z.) 

3.5.3. LetT bealattice in Rf, and let A be a convex body (possibly asymmetric). 
Show that o[A NT] < O0). 

3.5.4 Letv,,..., vg be any vectors in a lattice T C R’ of full rank. Show that 
|v; A+++ A val is an integer multiple of the covolume mes(R? /T). 

3.5.5. Let F be a lattice of full rank in Rf, let B be a symmetric convex body, 
and let vj,..., vg be a directional basis with successive minima A, < 
+++ < Aq. Let O be the open octahedron with vertices tv; /À j. Show that 
O C BC O(a)‘ - O. Thus Minkowski’s second theorem can be used to 
give a rather weak version of John’s theorem. 

3.5.6  LetT bea lattice of full rank in R, let B be a symmetric convex body, and 
let 4; <--- < dq be the successive minima of B. Establish the bounds 


1 1 
—O(d) O(a) 
O(d) | | max (1, >) <|BAT]| < Od) | | max (1 ~). 


l<i<d l<i<d 
(3.21) 


3.5.7 Generalize Lemma 3.32 and Lemma 3.36 to the case when B is an asym- 
metric convex body. 





3.6 Progressions and proper progressions 143 


3.5.8 Let A be a bounded open subset of R, and let B, C be open subsets of 
A. Prove that 
mes(B)mes(C)mes(A) 
mes(A — B)mes(A — C)“ 





mes((B — B)N(C — C)) > 


(Hint: use the volume-packing argument to locate a large set of the form 
(x + B)N (y + C) where x € A — Bandy € A — C.) 

3.5.9 Let B the the unit ball in R$, and let T be the lattice generated by the five 
basis vectors e;,..., es and by 5(e1 +.----+ e5). Show that in this case 
the directional basis for F does not actually generate I. 


3.6 Progressions and proper progressions 


In this section we work in a fixed additive group Z, which may or may not be 
torsion-free. 

Recall from Definition 0.2 that a progression P = a + [0, N] - v is proper if the 
map n +> n- v is injective on [0, N]. Not all progressions are proper; however it 
turns out that, just as John’s theorem (Theorem 3.13) shows that all convex sets are 
in some sense comparable to ellipsoids, all progressions are comparable to proper 
progressions. This is most obvious in the rank 1 case, in which every arithmetic 
progression is equal (as a set) to a proper arithmetic progression: 


Lemma 3.37 Leta + [0, N] - v be an arithmetic progression in an additive group 
Z. Then there exists ann > O such that a + [0, n) - v is a proper arithmetic pro- 
gression anda + [0, n); v =a + [0, N] - v. 


Proof If a+ [0, N]-v is already proper, then we are done. Otherwise, there 
exist distinct nı, n2 € [0, N] such that a + n; -v = a + m - v. In particular, there 
exists n € [1, N] such that n - v = 0. Let n be the least integer in [1, N] with this 
property. Then a + [0, n) - v is necessarily proper, and by the Euclidean algorithm 
it is clear that a + [0,n)-v = a + [0, N] - v. 














We now consider the higher rank case; as with John’s theorem, the constants 
will deteriorate worse than exponentially in d. We first show the easier of the two 
containments, namely that every progression contains a large proper progression 
of equal or lesser rank. 


Theorem 3.38 Let P be a progression of rank d in an additive group Z. 
Then P contains a proper progression of rank at most d and volume at least 


O(d)™“|P|. 


144 3 Additive geometry 


Remark 3.39 For a result of similar flavor (but proven by completely different 
methods), see Theorem 4.42 below. Note that the d = 1 case already follows from 
Lemma 3.37 (with a constant of 1 instead of O(d)~4). 


Proof The ideais to pass to a convex body, apply Lemma 3.32 to obtain a “proper” 
subset of this body, and then use Lemma 3.33 to pass back to a progression. 

By translating and enlarging P slightly we may assume P = [—N,N]-v. 
We may assume that none of the components N; of N are equal to 0 or 1, since 
otherwise we could refine P by at worst a factor of 3¢ to eliminate those dimensions. 
Now consider the set I := {n € Zf : n- v = 0}, which is clearly a sub-lattice of 
Zf, and let A be the symmetric convex box 


A := {(x1,...,%a) E R? : —Nj <x; < N; forall 1 < j < d}. 


By Lemma 3.32, we may find a symmetric convex subset A’ of A such that A’ — A’ 
is disjoint from I — {0}, and such that A C O(d)*’? . A’ +T for some x € R°. 
From Corollary 3.15, we thus see that A can be covered by O(d)*“/? translates 
of $ - A’ +T. Since [-N, N] = AN Z’ and T C Z’, we conclude that [—N, N] 
can be covered by O(d)*4/? sets of the form [G -A + x)NZ44+1. Taking inner 
products with v, we conclude that P = [—N, N] - v can be covered by O(dy*4/? 
sets of the form [G - A! +x) N Z4]- v. By the pigeonhole principle, there must 


thus exist an x such that 
1\ 34/2 
>Q G) |P| 


1 
(5-448) nz 


1 \ 34/2 
|A'NZ4| > Q{ = |P]. 
d 


We now apply Lemma 3.33 to find a proper progression P € A’ N Z? C [0, N] of 
rank at most d such that 





and hence by (3.9) 


` iNY 
IŽ] > Od)" |A' NZ4|>Q (3) |P]. 


The set P - v is then clearly a progression of rank at most d contained in P; it is 
proper since A’ — A’ is disjoint from I — {0}, so in particular | P - v| = | P|). The 
claim follows. 














Now we show the more difficult containment, that every progression can be 
contained inside a proper progression of equal or lesser rank, but somewhat larger 
volume. 


3.6 Progressions and proper progressions 145 


Theorem 3.40 Let P be a progression of rank d in an additive group Z. Then 
P is contained in a proper progression Q of rank at most d and volume at most 
deod | P| for some absolute constant Co > 0. Also, Q is contained in a translate of 
doo" P, Ifd > 2 and P is not proper, then Q can be chosen to have rank at most 
d — 1. Finally, if Z is torsion-free and P is symmetric, then one can ensure that 
Q is symmetric also. 


Remark 3.41 Theorems of this type first appeared in the literature in [26], and 
later in some unpublished work of Gowers—Walters and Ruzsa. The version we 
give here is taken from [365]. 

Comparison with Theorem 3.38 suggests that the factor d Cod ig probably not 
best possible, but we do not know what the correct constant here should be. This 
theorem can be thought of as the analogue of Corollary 3.8 or Corollary 3.9, but 
for progressions rather than finitely generated additive groups. 


Proof This claim is analogous to the basic linear algebra statement that every 
linear space spanned by d vectors is equal to a linear space with a basis of at most d 
vectors. Recall that the proof of that fact proceeds by a descent argument, showing 
that if the d spanning vectors were linearly dependent, then one could exploit that 
dependence to “drop rank” and span the same linear space with d — | vectors. Our 
proof of Theorem 3.40 shall be based on a similar strategy. 

We shall work only in the case when Z is torsion-free; the general case is proven 
similarly but contains a few additional technicalities, and we leave it as an exercise 
(Exercise 3.6.3). 

We induce on d. When d = 1 the claim follows from Lemma 3.37. Now sup- 
pose inductively that d > 2, and the claim has already been proven for d — 1 (for 
arbitrary groups Z and arbitrary progressions P). Let P = a + [0, N] -v bea 
progression in Z of rank d, where N = (Nj,..., Na) and v = (vj,..., Va); we 
may translate P so that the base point a equals 0. If P is proper, then we are 
done. Similarly, if one of the N; is equal to zero, then we are done by induction 
hypothesis. Suppose instead that P is not proper and all the N; are at least 1; then 
there exist distinct n, n’ € [0, N] such that n - v = n’ - v. If we then let To C Z4 
denote the lattice {m € Z? : m - v = 0}, then we see that Tro N [-N, N] contains 
at least one non-zero element, namely n’ — n. 

Let m = (mı, ..., mq) be a non-zero element of [9 N [-N, N], thus 


mı: vi +--:-+mg: va = 0. (3.22) 


We may assume without loss of generality that m is irreducible in To. Since Z is 
torsion-free, this also implies that m is irreducible in Zé (i.e. that the mı, ..., ma 
have no common divisor) unless Z is torsion-free. The strategy shall be to contain 


146 3 Additive geometry 


P inside a progression Q of rank d — 1 and size 
joad ri; (3.23) 


such that Q is contained in a translate of d? P. If we can achieve this, then by 
the induction hypothesis we can contain Q inside a proper progression R of rank 
at most d — 1 and cardinality 


IRI < (d — N° (Od)? P| 


and which is contained in a translate of d€- d° P. If Co is sufficiently large, 
we will have completed the induction. 

It remains to cover P by a progression of rank at most d — 1 with the bound 
(3.23) and contained in a translate of d?™ P. Observe that m lies in [—N, N], so 
the rational numbers m,/N,,...,m™q/Nq lie between —1 and 1. Without loss of 
generality we may assume that m,/Nq has the largest magnitude, thus 


lma|/Na = \mj|/Nj (3.24) 


forall 1 < j < d. By replacing vg with —vg if necessary, we may also assume that 
mq is positive. 

To exploit the cancellation in (3.22) we introduce the rational vector q € 
+ - Z4—! by the formula 


’ my Mq-1 
q := pees ‘ 
Mq Maq 


Since m is irreducible in Z“, we see, for any integer n, that n - q lies in Z^! if and 
only if n is a multiple of ma, because (m1, ..., mq) is irreducible in Z’. 

Next, let T C R4~! denote the lattice I := Z^! + Z - q. Since q is rational, 
this is indeed a lattice; since it contains Z4—!, it is certainly full rank. We define 
the homomorphism f : I — Z by the formula 





f(n, -< -, Nd-1) + naq) = (nı, Lee, Nq): v; 


the condition (3.22) ensures that this homomorphism is indeed well defined, in the 
sense that different representations v = (n1, .. . , Ng—1) + naq of the same vector 
v €T give the same value of f(v). We also let B C R7~! denote the convex 
symmetric body 


B := {(t},...,tg-1) € Re! —3N; < t; < 3N; foralll < j <d- 1}. 
We now claim the inclusions 


PC f(BNT)C5P—5P. 


3.6 Progressions and proper progressions 147 


To see the first inclusion, let n-v € P for some n € [0, N], then we have 
n-v= f((m,...,Na-1) +naq); from (3.24) we see that the jth coefficient of 
(ni, ...,Na—1) + naq has magnitude at most 3N;, and thus n - v lies in f(B NT) 
as claimed. To see the second inclusion, let (n1, ..., na—1) + naq be an element 
of BMT. By subtracting if necessary an integer multiple of m4 from ng (and thus 
adding integer multiples of m,,...,mg_1 to m1,...,Ma—1) We may assume that 
|na| < |ma|/2. By (3.24) and the definition of B, this forces |n;| < 5N; for all 
1 < j < d, and hence 


f(m, ...,Mg-1) + naq) = (ni, ..., na): v C [-5N, 5N]. v = 5P — 5P. 


Next, we apply Theorem 3.36 to find vectors w;,..., Wg—1 € TandM,,..., Ma—1 
such that 


(M, M)-w C BAT C (—d°® M, d?© M). w. 
Applying the homomorphism f, we obtain 
(M, M)- f(w) ¢ f(B AT) C (~4°® M, d° M)- f(w) 


where f(w) := (f(w1),..., f(wa_1). Observe that (-d?M, d?® M) - f(w) 
is a progression of rank d — 1 which contains f (B AT) and hence contains P. 
Furthermore, by two applications of Lemma 3.10 we have 


(-d° M, d° M)- f(w)| < (OPPI FB NAT) 
< (O(d))°'|5P — 5P] 
< (O(d))° 00)" |P| 
which proves (3.23). Also, since (~M, M)- f(w) is contained in f(B NT), 
which is contained in 5P —5P, which is a translate of 10P, we see that 
(—d° M, d? M) - f(w) is contained in a translate of d? P. This completes 


the induction and proves the theorem. When P is symmetric, one can easily modify 
the above argument to ensure that all progressions in the above construction are 














also symmetric; we leave this modification to the interested reader. 


Exercises 


3.6.1 Let P =a + [0, N]- v bea progression of rank d in some additive group 
Z, and let I := {n € Zf : n - v = 0} be the associated sub-lattice of Z4. 
Prove the inequalities 


0, N 0, N 
KEE OE 
|P| | P| 
Thus the ratio between the volume and cardinality of a progression P is 
essentially controlled by the quantity |[—N, N] A T|. (Hints: for the lower 


148 


3.6.2 


3.6.3 


3.6.4 


3.6.5 


3.6.6 


3 Additive geometry 


bound, first use Cauchy—Schwarz to obtain a lower bound for {(n, n’) € 
[0, N] : n - v = n' - v}. For the upper bound, consider the multiplicity of 
the map f : [-N, 2N] —> Z defined by f(n) :=n- v.) 

Let [0, N] be a box in Zf, and let T be a sub-lattice of Z’. Show that 
I[-kN, KN] ATI < (2k)¢|[—N, N] AT] for all integers k > 1. 

Prove Theorem 3.40 in the case when Z is not necessarily torsion-free. 
(The main new difficulty is that the vector m is not always irreducible in 
Zf; in such a case one will have to “quotient out” a finite cyclic group 
from P before proceeding with the rest of the argument. However, this 
will only introduce additional factors of C4 into the inductive bound 
(3.23), which is acceptable.) Note that the second part of the Theorem 
does not extend to the torsion case, as can already be seen by considering 
P=Z=L). 

Prove an extension of Theorem 3.40 in the torsion-free case in which 
one requires that kQ is also proper for some fixed constant k > 1 (of 
course, the bounds on Q will depend on k). Note that the torsion-free 
hypothesis is now essential, as can be seen by considering the case when 
P=([1,N]-linZy. 

[349] Let Ni, N2, a1, a2 be positive integers such that 0 < ay < N,/5 
and 0 < a; < N2/5, and aj, az are coprime. Use the Chinese remainder 
theorem to show the inclusion 


1 4 
[seam + aN), 5a + ans) C [0, (N1, N2)] - (a1, a2). 


Conclude that if P is any progression of rank 2 in the integers 
of dimensions Nj, N2 and steps vı, v2 with 0 < v < N,/5 and 0 < 
vı < N>/5, then P contains a proper arithmetic progression of length 
3(Niv, + Nov2)/S5gcd(v1, v2) and spacing gcd(vj, v2). 

[349] Let A be an additive set in an ambient group Z. Show that there exists 
d = O(log |A|) and distinct elements v;,..., vg € A such that the cube 
[0, 1]¢ - (vj, ..., vg) has cardinality at least HAI. (Hint: Using (2.21), 
show that if S is any additive setin Z such that |S] < 1 |A|, then there exists 
a € A such that |S U (S + a)| > ŠIS]. Then use the greedy algorithm.) 


4 





Fourier-analytic methods 


In Chapter 1 we have already seen the power of the probabilistic method in additive 
combinatorics, in which one understands the additive structure of a random object 
by means of computing various averages or moments of that object. In this chapter 
we develop an equally powerful tool, that of Fourier analysis. This is another way 
of computing averages and moments of additively structured objects; it is similar 
to the probabilistic method but with an important new ingredient, namely that the 
quantities being averaged are now “twisted” or “modulated” by some very special 
complex-valued phase functions known as characters. This gives rise to the concept 
of a Fourier coefficient of a set or function, which measures the bias that object has 
with respect to a certain character. These coefficients serve two major purposes in 
this theory. Firstly, one can exploit the orthogonality between different characters 
to obtain non-trivial bounds on these coefficients; this orthogonality plays a role 
somewhat similar to the role of independence in probability theory. Secondly, 
Fourier coefficients are very good at controlling the operation of convolution, 
which is the analog of the sum set operation, but for functions instead of sets. 
Because of this, the Fourier transform is ideal for studying certain arithmetic 
quantities, most notably the additive energy introduced in Definition 2.8. 

Using Fourier analysis, one can essentially divide additive sets A into two 
classes. At one extreme are the pseudo-random sets, whose Fourier transform is 
very small (except at 0); we shall introduce the linear bias ||A||, and the A(p) 
constants to measure this pseudo-randomness. Such sets are very “mixing” with 
respect to set addition (or to locating progressions of length three), and as the ter- 
minology implies, they behave more or less like random sets. At the other extreme 
are the almost periodic sets, which include arithmetic progressions, Bohr sets, and 
other sets with small doubling constant or large additive energy. The behavior of 
these sets with respect to set addition or progressions of length three is almost 
completely described by a rather small spectrum Spec,(A), defined as the set 
of frequencies where the Fourier transform of 14 is large. We shall rely on this 


149 


150 4 Fourier-analytic methods 


dichotomy between randomness and structure in a number of ways, most strik- 
ingly in proving Roth’s celebrated theorem (which we discuss in Chapter 10) that 
subsets of integers of positive upper density contain infinitely many progressions 
of length 3. (Progressions of higher length cannot be treated by linear Fourier 
techniques, requiring either higher order Fourier analysis or other approaches; see 
Chapter 11.) 

Fourier analysis can be performed on any additive group Z (and even on non- 
abelian groups). However, we shall only need this transform on finite groups, 
where the theory is slightly simpler technically. Thus we shall restrict our atten- 
tion exclusively to the finite case. The cases Z = Z, Z = R/Z, and Z = R are 
also of importance to additive combinatorics (in particular leading to the Hardy- 
Littlewood circle method in analytic number theory), but it turns out that the finite 
Fourier theory forms an acceptable substitute for these infinite Fourier theories in 
our applications. 


4.1 Basic theory 


Let Z be a finite additive group (for instance, Z could be acyclic group Z = Zy). In 
this section we recall the basic theory of the finite Fourier transform on such groups. 

Fourier analysis relies on the duality between a group Z and its Pontryagin dual 
Z, which can be defined as the space of homomorphisms from Z to the circle group 
R/Z. In the case of finite groups, it turns out that a group Z and its Pontryagin dual 
Z are always isomorphic, and so it shall be convenient to identify the two in order 
to simplify the theory slightly. This can be done by means of a non-degenerate 
bilinear form: 


Definition 4.1 (Bilinear forms) A bilinear form on an additive group Z is a map 
(£, x) &-x from Z x Z to R/Z, which is a homomorphism in each of the 
variables £, x separately. We say that the form is non-degenerate if for every non- 
zero é the map x b> & - x is not identically zero, and similarly for every non-zero 
x the map +> & - x is not identically zero. We say the form is symmetric if 


E-x=x-é. 


Examples 4.2 If Z is a cyclic group Zy then the bilinear form x -€ := x&/N is 
symmetric and non-degenerate. If Z is a standard vector space F” over a finite 
field F, then the bilinear form (x1, ..., Xn) < (E1; .--, En) = Qa éi +--+ + Xn En) 
is symmetric and non-degenerate whenever @ : F — R/Zis any non-trivial homo- 
morphism from F to R/Z (e.g. if F = Z, we can take ¢(x) := x/p). This 
particular choice has the useful additional property that a& -x = € - ax for all 
a € F andx, € Z. 


4.1 Basic theory 151 


Lemma 4.3 (Existence of bilinear forms) Every finite additive group Z has at 
least one non-degenerate symmetric bilinear form. 


Proof From Corollary 3.8 we know that every finite additive group is the direct 
sum of cyclic groups. We have already seen in Example 4.2 that each cyclic group 
has a symmetric non-degenerate bilinear form. Finally, observe that if Z, and 
Z have symmetric non-degenerate bilinear forms, then the direct sum Z; ® Z2 
also has a symmetric non-degenerate bilinear form, defined by (£1, &2) - (x1, x2) := 
& - x1 + & - x2. The claim follows. 














Remark 4.4 A given additive group Z generally has multiple bilinear forms (see 
Exercise 4.1.10), but from the point of view of Fourier analysis they are all equiv- 
alent!. The symmetry property has some minor aesthetic advantages but is not 
essential to the Fourier theory, as the physical space variable and the frequency 
space variable usually play completely different roles. 


Henceforth we fix a finite additive group Z, equipped with a non-degenerate 
symmetric bilinear form & - x; in practice we shall usually use one of the two 
examples from Example 4.2. 

To perform Fourier analysis, it will be convenient to adopt the following 
“ergodic” notation. Let C% denote the space of all complex-valued functions 
f: Z — C.If f € C%, we define the mean or expectation of f to be the quantity 


1 
Ez(f) = Exez f(x) = > >) f(x). 
IZ] xEZ 


Similarly, if A C Z, we define the density or probability of A as 


|A] 
Pz(A) = Prez € A) := Ez(14) = IZI 
We can generalize this notation to other finite non-empty domains than Z, thus 
for instance E,e4 yep f (x, y) = Aa renen f(x, y). This notation not only 
suggests the connections between Fourier analysis, ergodic theory, and probability, 
but is also useful in concealing from view a number of normalizing powers of |Z| 
which would otherwise clutter the estimates. Generally, we shall use this ergodic 
notation for the physical variable, but use the discrete notation X- tez f(E) and |A| 
(without the normalizing |Z| factor) for the frequency variable. We shall also rely 


1 One way of viewing this is that the identification between Ê and Z is non-canonical, and one 
should really be placing the frequency variable in Z instead of Z. This is ultimately the more 
correct viewpoint; however since we shall usually be working in very concrete situations such as 
cyclic groups Zy, where one does have a standard identification, we have chosen to rely on the 
bilinear form approach here rather than the abstract approach. 


152 4 Fourier-analytic methods 


heavily on the exponential map e : R/Z — C, defined by 
e(O) := er? (4.1) 


The following two orthogonality properties form the foundation for Fourier 
analysis. 


Lemma 4.5 (Orthogonality properties) For any £, &’ € Z we have 
Eyeze(& - xe(&!- x) = WE = £’) 
and for any x, x' € Z we have 


X eG -oe +x) = IZI = x’). 


EEZ 








Proof We prove the first identity only, as the second is similar. Since 
e(é - x)e(é’- x) = e((€ — E) - x), it will suffice to show the claim in the £’ = 0 
case, i.e. it suffices to show 


E,<ze(é +x) = 1 = 0). 


This is clear in the case £ = 0. If E Æ 0, then by non-degeneracy there exists h € Z 
such that e(€ - h) Æ 1. Shifting x by h we then have 


Eyeze(é - x) = Exeze(§ - (x +h)) = el - A)ExezelE - x) 
and hence Exyeze(é - x) = 0 = I(é = 0) as desired. 

















For every € € Z, we can define the associated character eg € C? by eg (x) := 
e(& - x). The above lemma then shows that the eg are an orthonormal system in 
CŽ, with respect to the complex Hilbert space structure 


(f, g)cz = Ez(f8) = Erez f(x) g(x). 


Since the number |Z| of characters equals the dimension |Z| of the space, we see 
that this system is in fact a complete orthonormal system. This motivates 


Definition 4.6 (Fourier transform) If f € C7, we define the Fourier transform 
f € CŽ by the formula 


PE) = (fies)cz = Exez fe - 2). 
We refer to f(E) as the Fourier coefficient of f at the frequency (or mode) £. 
Since the eg are a complete orthonormal basis, we have the Parseval identity 


1/2 
Œz fP = (z er) (4.2) 


EEZ 


4.1 Basic theory 153 


the Plancherel theorem 


(f.gdez = >> FORE) (4.3) 


EEZ 


and the Fourier inversion formula 


f=} fOe. (4.4) 


EEZ 


In particular we see that two functions are equal if and only if their Fourier coeffi- 
cients match at every frequency. In other words, the Fourier transform is a bijection 
from C7 to CŽ (in fact it is a unitary isometry, thanks to (4.2), (4.3)). 

From Lemma 4.5 we see that the Fourier coefficients of a character eç are just 
a Kronecker delta function: 


E&E) = IG = &’). 


In particular Î(£) = I(é = 0). 

A special role in the additive theory of the Fourier transform is played by the 
zero frequency & = Q. This is because the zero Fourier coefficient is same concept 
as expectation: 


fO) = (f, cz = Ez( f). (4.5) 


If S is any subset of Z, define the orthogonal complement SŁ C Z of S to be 
the set 


St := {£ € Z: £- x = 0 for all x € S}. 
One can easily verify that S+ is a subgroup of Z. Also one has the pleasant identity 
1G = Pz(G)lo: (4.6) 


whenever G is a subgroup; see Exercise 4.1.6. Applying (4.2) we see in particular 
that 


IG\|G*| = |Z]. (4.7) 


We now introduce the fundamental notion of convolution, which links the 
Fourier transform to the theory of sum sets. 


Definition 4.7 (Convolution) If f, g € L?(Z) are random variables, we define 
their convolution f * g to be the random variable 


f * g(x) = Eyez f(x — y)g(y) = Eyez f(y) g(x — y). 


We also define the support supp(f) of f to be the set supp(f) = {f 40} = 
{xxe Z: f(x) £ O}. 


154 4 Fourier-analytic methods 


The significance of convolution to sum sets lies in the obvious inclusion 
supp(f * g) G supp(f) + supp(g) 
and particularly in the identity 
A + B = supp(1, * 1g). 
Indeed we have the more precise statement 
la * l1g(x) := Pz(A N (x — B)). (4.8) 


The relevance of the Fourier transform to convolution lies in the easily verified 
identity 


fegaf-8 (4.9) 
Applying (4.9) at the zero frequency we have the basic formula 
Ez(f * 8) = (Ezf)- (Ezg). (4.10) 


In particular, if f or g has mean zero, then so does f * g. 
As one consequence of (4.9) we see that convolution is bilinear, symmetric, 
and associative. We also have a dual version of (4.9), namely the formula 


fg) = >> ÂME =n) (4.11) 
neZ 
which converts pointwise product back to convolution; we leave the verification 
of these identities as an exercise. 
In the exercises below, Z is a fixed finite additive group, with a fixed symmetric 
non-degenerate bilinear form -. 


Exercises 


4.1.1 Let Ê be the additive group consisting of all the homomorphisms from 
Z to R/Z. Show that the identification of a frequency £ € Z with the 
homomorphism x +> & - x gives an isomorphism from Z to Ê. 

4.1.2 Define a character to be any map x: Z—C with x(0)=1 and 
x(x + y) = x(x)x(y) for all x, y € Z. Show that the set of all characters 
is precisely {ez : € € Z}. 

4.1.3 Show that for any  € Z, eg takes values in the |Z|th roots of unity. 

4.1.4 Define a linear phase function to be any map ¢: Z — R/Z with the 
property that 


P(x + hy +h2)—G(x + hi)—G& + ho) +O(x)=0 for all x, hı, h2 € Z. 


4.1.6 
4.1.7 


4.1.10 


4.1.11 
4.1.12 


4.1.13 


4.1 Basic theory 155 


Show that @ : Z — R/Z is a linear phase function if and only if there 
exists é € Z and c € R/Z such that ¢(x) = € - x + c for all c. (The map 
¢ is also a Freiman homomorphism of order 2; see Definition 5.21.) 

Let x be an element of Z chosen uniformly at random. Show that the ran- 
dom variables {eg(x) : € € Z} are pairwise independent, and have vari- 
ance | and mean zero for € Æ 0, and variance 0 and mean 1 for £ = 0. 
Use this and (1.9), (4.4) to give an alternative proof of (4.2). 

Prove (4.6). 

Let f : Z > C. If H is a subgroup of Z, and g := f1y, show that 


2E) = Epen f(E + n) for all é € Z 


and conclude in particular the Poisson summation formula 


Exen f(x) = Esem fE). 


In the converse direction, ifh = f x man ly is the average of f on cosets 
of H, ie. 


h(x) := Eyey f(x + y), 


show that h = f- 1p. 

If ġ : Z — Z is a group isomorphism of Z, then there exists a unique 
group isomorphism ¢Ý : Z —> Z, called the adjoint of , such that 
E- d(x) = pi E) - x for all x, € € Z. Furthermore if g(x) = f(¢(x)) for 
all x € Z then (x) = f((¢')~!(x)) for all x € Z. 

If@:Z— Zand fy: Z — Z are group isomorphisms, show that (¢ o 
wt = ot. 

Lete : Z x Z > R/Zandé: Z x Z — C be two non-degenerate sym- 
metric bilinear forms on a finite additive group Z. Show that there exists a 
self-adjoint group isomorphism ¢ : Z — Z such that 6x = £ e d(x) = 
p'(E)e x for all x, € Z. This shows that all Fourier transforms are 
equivalent up to isomorphisms of either the x or & variable. 

Prove (4.9) and (4.11). 

Let x be an element of Z chosen uniformly at random, and let €,,...,&, € 
Z. Show that the random variables ez, (x), ..., eg, (x) are jointly indepen- 
dent if and only if the group (&,, ..., En) generated by £1, . . . , En has order 
ord(&,)... ord(&,). 

Let G, H be two subgroups of Z. Show that (G + H)+ = GŁ N H+H, 
(GN H)} = G+ + HŁ, and d(G+, H+) = d(G, H), where d is the 
Ruzsa distance defined in Definition 2.5. This may help explain the sym- 
metric nature of G + H and G N H in the estimates in Exercise 2.3.11. 


156 


4.1.14 


4.1.15 


4.1.16 


4 Fourier-analytic methods 


Let G, H be two subgroups of Z and let x be an element of Z chosen 
randomly. Show that the indicators I(x € G) and I(x € H) have non- 
negative correlation, i.e. Cov(I(x € G), I(x € H)) > 0; establish this 
both by Fourier-analytic means and by direct computation. Show that 
equality occurs if and only if G+ H = Z. 

Show that for any subgroup G_of Z, we have (G+)+ = G, and for any 
random variable f, we have f(x) = |Z|~! f(—x). More generally, for 
any A C Z, we have (A) = (At)+, where (A) is the group generated by 
A. 

If Z and Z’ are finite groups, formulate a rigorous version of the statement 
that the Fourier transform on Z x Z’ is the composition of the Fourier 
transform on Z and the Fourier transform on Z’. 


4.2 LP theory 


We now turn to the analytic theory of the Fourier transform and of convolutions, 
starting with the L? theory, and then apply it to the problem of locating arithmetic 
progressions inside sum sets. 

If f € C7 and 0 < p < œ, we define the L? (Z) norm of f to be the quantity 


Ifl = (Ez fI = (Exezl f0). 


Thus for instance || f || z2(z) is just the Hilbert space magnitude of f. We also define 


IF lze = sup | f. 
xEeZ 


Similarly we define 


1/p 
Iflg = (x sar) 


EEZ 


for0 < p < œ and 


ILF liez == sup | f(E)I. 
EEZ 


We have the following two basic L? estimates on the Fourier transform and on 
convolution. 


Theorem 4.8 Let f, g : Z — C be functions on an additive group Z. Then for 
any | < p < 2 we have the Hausdorff—Young inequality 


lfl < Ifl (4.12) 


4.2 L? theory 157 


where the dual exponent p' to p is defined by E + 7 = 1. Also, whenever 1 < 


Þ,q,r < œ are such that E + ; = 1 + 1, we have the Young inequality 


If * gle < Ilfllecalellrz. (4.13) 


Both inequalities follow easily from Riesz—Thorin complex interpolation the- 
orem. With this theorem, one only needs to verify the extremal (and easy) cases. 
The Riesz—Thorin theorem, however, is beyond the scope of this book. On the 
other hand, one can also have an elementary proof, using combinatorial arguments 
(see Exercise 4.2.3). 

Recall the additive energy E(A, B) between two additive sets A, B in Z, defined 
in Definition 2.8. From that definition one can easily check that 


E(A, B) = |Z ||La * 18722) 
By (4.2) and (4.9) we obtain the fundamental identity 


E(A, B) = |ZPE(1a, 1) = IZÈ X LOP E. (4.14) 
EEZ 
This formula may illuminate some of the properties of the additive energy that were 
obtained in Section 2.3, such as the symmetries E(A, B) = E(B, A) = E(A, —B) 
and the Cauchy—Schwarz inequality (2.9); see Exercise 4.2.7. 
For the purposes of additive combinatorics, the Fourier transform is most useful 
when applied to characteristic functions f = 14, and in this case one can say quite 
a bit about the Fourier transform and its relation to the additive energy E(A, A). 


Lemma 4.9 Let A be a subset of a finite additive group Z, and let i :Z>C 
be the Fourier transform of the characteristic function of A. Then we have the 


identities: 
lAl = sup GI = 14) = Pz(A); (4.15) 
Ilo = >. a)? = Pz(A); (4.16) 
EEZ 
DE) = CE (4.17) 
a Se E(A,A 
itn ere (4.18) 
= |Z] 
LE = D> AMAG — n). (4.19) 
nEZ 


This lemma follows easily from the estimates that have already been established; 
see Exercise 4.2.4. 


158 4 Fourier-analytic methods 


We now present a simple application of the Fourier transform in the setting of 
a finite field F. 


Lemma 4.10 /41] Let F be a finite field, and let A be a subset of F\{O} such that 
|A| > |F|’. Then 


3(A-A)=A-A+A-A+A-A=F. 


Proof We give F a symmetric non-degenerate bilinear form of the type in Exam- 
ple 4.2. Let f : F — R denote the non-negative function 


f := Euealaa. 


Observe that supp( f) = A - A and f(0) = Er f = PF (A). Taking Fourier trans- 
forms we obtain 

FE) = Easca 14/0) 
for any £ € F. If € Æ 0, then we observe that the frequencies &/a are all distinct 


as a varies. Using Cauchy—Schwarz and then (4.16), we then obtain 


A 1 

ROI < aah Pay = 1/|F|'? for é £0. 

Now let x € F be arbitrary. We use (4.4) and (4.9) to compute 
fx fx f(x) =Ref * f * f(x) 


=Re $ fp e(& - x) 


éeF 
> Ref- D> fe? 
&€F\{0} 
> Pr(AY -Ñ FTI RE 
éeF 
= P(A — | FIP r(A) 


>0 


since P(A) > |F|~'/* by hypothesis. Since supp(f * f x f) = 3(A- A) and x 
was arbitrary, we are done. 














Remark 4.11 Lemma 4.10 is a simple example of a sum-product estimate — an 
assertion that a combination of a sum and product of a set A is necessarily much 
larger than A itself. It can be viewed as a quantitative reflection of the fact that a 
set A of cardinality greater than | F|?/* has difficulty behaving like a subfield of F. 
It should be compared with the results in Section 2.8. 


4.2 L? theory 159 


Exercises 


4.2.1 


4.2.2 


4.2.3 


4.2.4 
4.2.5 


4.2.6 


Let 1 < p < oo. By exploiting the convexity of the function x > |x|?, 
establish the convexity of the set {f € C7 : || fll z»rz) < 1}, and conclude 
the triangle inequality 


If + glez) < If lz + lgllz(z)- 


Argue similarly for the p = oo case and with L” replaced by l”. 

Let 1 < p < œ, and let p’ the dual exponent, thus 1/p + 1/p’ = 1. By 
exploiting the convexity of the function x +> e*, establish the preliminary 
inequality 


Erez fW ll] < 1 whenever || f llez), lgliz <l, 


and then conclude Hölder’s inequality 


Il fella) < If leo lalz 


whenever 0 < p,q,r < œ are such that i + 7 = L, Similarly with the 
LP norms replaced by /? norms. 

The purpose of this exercise is to give a proof of Theorem 4.8 that does 
not require complex interpolation. First use (4.2), the trivial bound 


A llimoczy < WF ll, (4.20) 


and Holder’s inequality to establish the weaker estimate 


fll cz) = Opl flo) 


whenever f € CŽ is supported on a set A and obeys an estimate of the 
form | f(x)| = OQ) for all x € A and some threshold à. Then, prove the 
even weaker estimate 


I Îl = Olf llez log + IZD) 


for arbitrary f € C% by applying the previous inequality to a dyadic 
decomposition of f, followed by the triangle inequality. Finally, remove 
the O,(log(1 + |Z])) factor to establish (4.12) by replacing Z with a large 
power Z™ of Z, and similarly replacing f with a large tensor power (as in 
Corollary 2.19) and letting M — oo. Argue similarly to establish (4.13). 
Prove Lemma 4.9. 

Let A be an additive set in a finite additive group Z. Show that 1, is 
real-valued if and only if A is symmetric. 

(Law of large numbers for finite groups) Let f : Z — R>o be such that 
Ez f = 1 and f(0) £0, and let H be the subgroup of Z generated by 
supp( f). Show that | f(&)| < 1, with equality if and only if € € H+. 


160 4 Fourier-analytic methods 


Next, define the iterated convolutions f ™ forn =1,2,... inductively 
2 fP := f and f@ := fx f™, and show that limps% f = 
Pad 7 T) 1,,. What can happen when the hypothesis f (0) Æ 0 is dropped? 
4.2.7 Use Fourier-analytic methods to give another proof of Corollary 2.10. 
4.2.8 Use Fourier-analytic methods to give another proof of Proposition 2.7. 
4.2.9 Let f be a random variable which is not identically zero. By using (4.2) 
and (4.20), establish the uncertainty principle 


|supp(f)||supp(f)| = |Z]. (4.21) 


Prove that equality occurs if and only if f(x) = ce(é - x) ly4.,(«) for 
some complex number c € C, some subgroup H of Z, and some 
E, xo E Z. This inequality can be improved for certain groups Z: see 
Theorem 9.52. 

4.2.10 Let f € CŽ be normalized so that WF lizz = Veez IPEN? = 1. By 
differentiating the Hausdorff—Young inequality in p, establish the entropy 
uncertainty principle 





1 
+ IPO log = > 
FE x)? 2 IFE 
where we adopt the convention that 0 log 7 = 0. (Hint: differentiate the 
Hausdorff—Young inequality in p at p = 2, using the fact that equality 
holds at that endpoint.) Using Jensen’s inequality, show that this inequality 
implies (4.21). 


Exez| fœ log ——, 


’ 


4.3 Linear bias 


One common way to apply the Fourier transform to the theory of sum sets or 
to arithmetic progressions is to introduce the notion of Fourier bias of that set 
(also known as linear bias or pseudo-randomness). Roughly speaking, this notion 
separates sets into two extremes, ones which are highly uniform (and behave like 
random sets, especially with regard to iterated sum sets), and ones which are highly 
non-uniform (and behave like arithmetic progressions). 


Definition 4.12 (Fourier bias) Let Z be a finite additive group. If A is a subset 
of Z, we define the Fourier bias ||A||, of the set A to be the quantity 


All. := sup |is(8)I- 
fe Z\{0} 
This quantity is always non-negative, with || A||,, = 0 if and only if A is equal to 


Z or the empty set (Exercise 4.3.1). It obeys the symmetries || Al], = || — Ally = 
A +All, = ||Z\All, for any h € Z (Exercise 4.3.2). We warn that this quantity 


4.3 Linear bias 161 


is not monotone: A C B does not imply ||A||, < ||B |lu. However, the Fourier bias 
does obey a triangle inequality (Exercise 4.3.3). The Fourier bias || A||,, can be as 
large as the density Pz(A), but is usually smaller (Exercise 4.3.4). Sets A with 
Fourier bias less than a are sometimes called -uniform or a-pseudo-random; sets 
with small Fourier bias are called linearly uniform, Gowers uniform of order 1, or 
pseudo-random. 

The connection between Fourier bias and sum sets can be described by the 
following lemma. 


Lemma 4.13 (Uniformity implies large sum sets) Let n> 3, and let 
A,,..., An be additive sets in a finite additive group Z. Then for any x € Z 
we have 
1 
|z|"! 
< MAr lu Anll Pz(An-1)' PAn). 


I{(a1,..., an) E€ A1 X: +X Ay xX = ait + +a,}| — Pz(A1) -+ Pz(An) 


In particular, if we have 
lAillu e lAn-2llu < Pz(41)- -  Pz(4n-2)Pz(An-1) P247)” 
then Ai +--+ A,r =Z. 


Of course, a similar result is true if we permute the Aj,...,A,. Note 
that the quantity Pz(A;)---Pz(A,) is the quantity one would expect for 
altar, -+ , an) € A1 X-X Apt x =a,t---+a,}| if the events a; € 
Aj,..-,4n E An were jointly independent conditioning on x = a; +--+ an. 
This may help explain why uniformity is sometimes referred to as pseudo- 
randomness. 


Proof By (4.9), the function 14, *---* 1,4, has Fourier transform lee tee ie 
Applying the Fourier inversion formula (4.4), (4.15), the Cauchy—Schwarz inequal- 
ity and (4.16) we thus see that 


14, *--- * 14,(%) = Rela, *--- * 14,(%) 


= Re) 14, (€) +++ Ta, Ee E) 


EEZ 


> mO 4,0) - Sa @l--- 1a, ©) 
§eZ\{0} 


> Pz(A1)---Pz(An)— [Alla = Anal X Ta, Ol Ta, © 


EEZ 
> Pz(A1) + Pz(A,) — IAr + An—alhell Ta leol La, Oleo 
= PZ(Ay)-+-Pz(An) — Ai llu + An—2lluPz(An—1)'/?Pz(An)'”. 


162 4 Fourier-analytic methods 


A similar argument gives 
Vay #020 La, Œ) < Pz(Ai) +++ Pz(An) + I Ai lla t Anll P(A) PA)". 
Since by definition of convolution 


1a, žk 1a O= ZIT (qa, ny an) € A1 X+ X An iX =a Han}, 











and the lemma follows. 





We now give an application of the above machinery to the finite field Waring 
problem. We first need a standard lemma. 


Lemma 4.14 (Gauss sum estimate) Let F be a finite field of odd order, and let 
A := F^2 = {a° : a € F} be the set of squares in F. Then |All, < JF + EE: 


Proof Let € e F\0. Since every non-zero element in A has exactly two repre- 





sentations of the form a”, we have 
i 1 1 1 
lag) = — > eb -x)= + e(—& - a°). 
|F| 2 2|F|  2|F| 3 


On the other hand, we may square 


X et-a) =| eE -a’) 


2 2 











Yo eE- a- b) 





acF aeF a,beF 
= J e&-@-(+hy)) 
a,heF 
= 5 e(—E - h?) 5 e(€ - 2ah). 
heF acF 


If h #0, then 2h #0, and J ep elé -2ah) = } epel- c)=0 thanks to 
Lemma 4.5. On the other hand, if h = 0, then). 7 e(€ - 2ah) = |F|. We conclude 
that | X aer eÈ- a?)|? = |F], and the claim follows. 














By combining this lemma with Lemma 4.13, one immediately obtains 


Corollary 4.15 Let F be a finite field of odd order, and let A = F^2 be the set of 
squares in F. Then kA = F for all k > 3. Indeed, for any x € F, the number of 
representations of x as a sum x = a; +: -- + ag with aj, ..., ag € F is (Qik 4 
OP | OP DF Fors 


We leave the verification of this corollary as an exercise. It shows that the sum 
sets kA are more or less uniformly distributed for k > 3. Note that when k = 2, one 
can still prove that 2A = F, but the sum sets can be quite irregular; for instance, 
if —1 is not a square in F, then O only has one representation as the sum of two 
elements in F. 


4.3 Linear bias 163 


We now present a lemma which asserts, roughly speaking, that if B is a 
randomly-chosen subset of A, then || B||,, is approximately equal to tt || A |lu; thus 
the Fourier bias decreases proportionally when passing to random subsets. 


Lemma 4.16 /149] Let A be an additive set in a finite additive group Z, and let 
0 <rt < l. Let B be a random subset of A defined by letting the events a € B be 
independent with probability t. Then for any à > 0 we have 


P(\l|Bllu — THAllul = 40) < 41Z| max (e™??/8, e7242), 
where o? := |A|t(1 — t)/|Z[’. 


The lemma is an easy consequence of Chernoff’s inequality and is left as 
an exercise. Applying it with à = C log"? |Z| for some large C, and assuming 
|A|t(. — t) > log |Z|, we see in particular that 


P(I| Bll. = TIHAllu + O (ø log”? |Z|)) = 1 — O(Z|7"°) 


(for instance). In particular if we set A= Z then we have ||Bll, = 
TZ+O(t(1— r) ez T) with high probability; thus random subsets of Z tend 
to be extremely anilo Note that Pz(B) ~ t with high probability, thanks to 
Corollary 1.10. 

A major application of Fourier bias is in the study of arithmetic progressions 
of length 3. We will study this application in detail in Chapter 10. 


Exercises 


4.3.1 Let A be a subset of a finite additive group Z. Show that || A||,, = 0 if and 
only if A = Z or A = Ø. 

4.3.2 Let A be a subset of a finite additive group Z. Show that ||All, = 
| — Alla = IT" Ally = ||Z\Ally for any h € Z. More generally, if œ : 
Z — Z' is any isomorphism from one additive group to another, show 
that ||@(A)||u = || All,. In a similar spirit, show that the Fourier bias of a 
set A does not depend on the choice of symmetric non-degenerate bilinear 
form. 

4.3.3 Let A, B be disjoint subsets of a finite additive group Z. Show that 
IAllu = Bal < WAU Bila < Allu + Bla 

4.3.4 Let A be an additive set in a finite additive group Z. Show that || All, < 
P7(A), with equality if and only if A is contained in a coset of a proper 
subgroup of Z. 

4.3.5 Let A and A’ be subsets of finite additive groups Z and Z’ respectively. 
Show that || A x Allu = |[Allull Alu. 


164 


4.3.10 


4.3.11 


4.3.12 


4.3.13 


4 Fourier-analytic methods 


Let A be a subset of a finite additive group Z. Show that ||All, = 
supg |(1a, e(@))cz|, where @ : Z > R/Z ranges over all non-constant 
linear phase functions (as defined in Exercise 4.1.4). 
Let A, B be additive sets in a finite additive group Z. Show that 
|A)?| Bl? HA2 
E(A, B) < -za + (ZAI |B. 
Using (2.8), conclude that if || A||,, < @Pz(A), then 


1 1 
|A+ B| > min (121, 23181) (4.22) 


Thus -uniform sets tend to expand sum sets by a factor of roughly œ~? 
(unless this is impossible due to the trivial bound |A + B| < |Z|). 
Let A be an additive set in a finite additive group Z. Show that 


Alt, < abla. A) — Pz(A)’ < ||All;Pz(A). (4.23) 
Thus uniform sets have additive energy E(A, A) close to the minimal 
value of Pz(A)*|Z|?, and vice versa. 
Let A be an additive set in a finite additive group Z, and let n > 3 be 
an integer. Using Lemma 4.13, show that ifn A Æ Z, then P(A) t= < 
l|Allu < Pz(A). This estimate is especially useful when n is very large, 
as it shows that 14 has a very large non-trivial Fourier coefficient. 
Prove Corollary 4.15. Also show the identity A -2A = A and conclude 
that 2A = F (using the fact that 3A = F to show that 2A Æ A). 
Use Chernoff’s inequality (in the form of Exercise 1.3.4) to prove 
Lemma 4.16. 
[149] Let A, B be additive sets in a finite additive group Z. Use 
Lemma 4.13 to establish the inequality 


ISl > Pz(A)!/?P2(B)!?Pz(S) 


whenever S is disjoint from A + B. In particular, this inequality holds 
when S = Z\(A + B). This shows that complements of sum sets are 
“hereditarily non-uniform”. 

Let A be a subset of a cyclic group Z, of prime order. Show that for any 
arithmetic progression P in Z,, we have the uniform distribution estimate 


1 
Pz,(AN P) = Pz,(A)Pz,(P) + Oe) + O (toe F IAl) 


for any 0 < e < 1. (Hint: apply a change of variables to make P = 
[—N, N] for some N. Approximate the indicator 1p by something a bit 


4.4 Bohr sets 165 


smoother (smoothed out at scale £p) and then compute the Fourier expan- 
sion. Apply Plancherel’s theorem (4.3) with this smoothed out function 
and 14 — P(A).) This inequality is a crude form of the famous Erdős- 
Turán inequality in discrepancy theory, and is related to the Weyl criterion 
for uniform distribution modulo one. 

4.3.14 LetA= z? be the set of squares in a cyclic group of prime order. Show 
that for any arithmetic progression P in Z,, we have 


1 
|AN P| = 5IP| + O(Vp log p). 


(Hint: use Lemma 4.14 and the preceding exercise.) This is a special case 
of the Polya—Vinogradov inequality from analytic number theory. 

4.3.15 Let F bea finite field, let Z be a vector space over F, and let M : Z > Z 
be a linear transformation. Show that if dimp (Z) > 3, then there exists 
a non-zero x € Z such that Mx -x = 0. (Hint: reduce to the case when 
M has full rank, and then modify Lemma 4.14. One can also solve this 
problem by purely algebraic methods.) 

4.3.16 [160] Let W be a vector space over a finite field F of odd order, and 
let M : W — W be a linear transformation. Show that there exists a 
subspace U of W with dimension dimp (U) > idim; (W) — 3 such that 
M is null on U, i.e. Mx - y = 0 for all x, y e U. (Hint: take a maximal 
space U which is null with respect to M. If the orthogonal complement 
UŁ := {y € W : Mx-y = 0 forall x € U} is at least three dimensions 
larger than U, then use the previous lemma.) For a purely algebraic proof 
of this fact, see Exercise 9.4.11. 


4.4 Bohr sets 


In many applications of the Fourier-analytic method, one starts with some additive 
set A and concludes some information about the Fourier transform Î a Of A (for 
instance, one may obtain some bound on the Fourier bias ||A||„). One would 
then like to pass from this back to some new combinatorial information on the 
original set A. For some special groups (e.g. finite field geometries F,) one can 
do this quite directly (see for instance Lemma 10.15). However, to convert Fourier 
information on general groups to combinatorialinformation we need the notion of 
a Bohr set (also known as Bohr neighborhoods in the literature). We first define 
a “norm” ||O||R/z on the circle group by defining ||@ + Zlir;z = |0| whenever 
—1/2 < 6 < 1/2; in other words, ||6 ||r/z is the distance from 6 (or more precisely, 
any representative of the coset @) to the integers. We observe the elementary bounds 


4|l@llx/z < le(8) — 1| < 27] \lr/z (4.24) 


166 4 Fourier-analytic methods 


which follow from elementary trigonometry and the observation that the sinc 
function sin(x)/x varies between 1 and 2/2 when |x| < 2/2. 


Definition 4.17 (Bohr set) Let S C Z be a set of frequencies, and let p > 0. We 
define the Bohr set Bohr(S, o) = Bohrz(S, p) as 


Bohr(S, o) := {x € Z : sup || - x|Ir/z < p}. 
EES 


We refer to S as the frequency set of the Bohr set, and p as the radius. The quantity 
|S] is known as the rank of the Bohr set. 


Remark 4.18 Note that if Z is a vector space over a finite field F, then every 
subspace of Z can be viewed as a Bohr set (with radius O(1/|F|), and rank equal 
to the codimension). Thus Bohr sets can be viewed as a generalization of subspaces. 
Note that most finite groups Z tend to have very few actual subgroups (the extreme 
case being the cyclic groups Z, of prime order), so it is convenient to be able to 
rely on the much larger class of Bohr sets as a substitute. 


Remark 4.19 One way to think of Bohr sets is to consider the embedding of Z 
into the complex vector space C® (and in particular to the standard unit torus inside 
C5) by the multiplicative map x + (e(é - x))zes. A Bohr set is thus the inverse 
image of a cube. 


Observe that the ||||k/z norm is symmetric and subadditive; || — x|lr/z = 
lx Ir7z and ||x + yllrz < |lx\IR/z + |lyllRz. From this we see that the Bohr sets 
Bohr(S, po) are symmetric, decreasing in S, and increasing in p (and fill out the 
whole space Z once p > 1/2); they are always unions of cosets of S+, and if p 
is sufficiently small they consist entirely of S+. One can also easily verify the 
intersection property 


Bohr(S, 0) N Bohr(S’, po) = Bohr(S U S’, p) 
and the addition property 
Bohr(S, p) + Bohr(S, p’) C Bohr(S, p + p^. 
In particular we have 
kBohr(S, p) € Bohr(S, kp) 


for any k > 1. 
Next, we establish some bounds for the size of Bohr sets. 


Lemma 4.20 (Size bounds) /f S C Z and p > 0, then we have the lower bound 


Pz(Bohr(S, p)) > p!*! (4.25) 


4.4 Bohr sets 167 


and we have the doubling estimate 
P7(Bohr(S, 2p)) < 4'*'P7(Bohr(S, p)). (4.26) 


This lemma should be compared with the Kronecker approximation theorem 
(Corollary 3.25); indeed the two results are very closely related. 


Proof For each & € S let 6; be an element of R/Z chosen independently and 
uniformly at random. For any x € Z, one can easily verify that 


Pz(ll «x — ¢|IR/z < p/2 for all £ € $) = p"'. 
Summing this over all x € Z using linearity of expectation (1.4), we conclude 
El{x € Z : |I -x — 6¢llR/z < p/2 for all € € S}| > p!*"|Z| 
and thus there exists a choice of 6; such that 
Hx eZ: |]E-x —OellRz < p/2 for all € € S} > p! |Z]. (4.27) 


Now observe from the triangle inequality that if x, x’ lie in the above set, then 
x — x’ lies in Bohr(S, p). The claim (4.25) follows. 

Now we prove (4.26). By a limiting argument we may replace 2p by 2p — £ on 
the left-hand side for some small e > 0. Observe that we can cover the interval {0 € 
R/Z: ||@\lrz < 2e — £} by four intervals of the form {0 € R/Z : ||@ — @llr/z < 
p/2}. We can thus can cover Bohr(S, 2p) by 4!°! sets of the type appearing in the 
left-hand side of (4.27). The claim follows by arguing as before. 














We have already mentioned that subspaces of a vector space are one example 
of a Bohr set. Progressions can form another example; for instance intervals such 
as (—N, N) in a cyclic group Zy can easily be seen to be a Bohr set of rank 
1. We can combine these two examples by introducing the concept of a coset 
progression. 


Definition 4.21 (Coset progressions) [157] A coset progression in an additive 
group Z is any set of the form P + H where P is a progression and H is a finite 
subgroup of Z. We say that the coset progression P + H is proper if P is proper 
and |P + H| = |P||H| (ve. all the sums in P + H are distinct). We say that a 
coset progression P + H has rank d if the component P has rank d. We say that 
P + H is symmetric if P has the form P = (—N, N)- v. 


Of course, Corollary 3.8 shows that every coset progression can also be viewed 
as an ordinary progression, but possibly of much larger rank. If however Z is a 
cyclic group of prime order, then H will either be trivial or equal to the whole 
space, and will thus increase the rank by at most 1. Indeed we can view vector 


168 4 Fourier-analytic methods 


spaces over small finite fields on the one hand, and cyclic groups of prime order 
on the other, as the two extremes of additive behavior for finite groups Z. 
Now we relate Bohr sets of rank d with coset progressions of rank d. 


Lemma 4.22 (Bohr sets contain large coset progressions) [160] Let 
Bohr(S, p) be a Bohr set of rank d in Z withO < p < L, Then there exists a proper 
symmetric coset progression P + H of rank 0 < d' < d, obeying the inclusions 


Bohr(S, d'p) C P + H C Bohr(S, p). (4.28) 
In particular, from Lemma 4.20 we have 
Pz(P + H) > ptd~”. (4.29) 
Furthermore we have H = SŁ. 


Proof Let p : Z —> (R/Z)S be the group homomorphism (x) := (E < x)ges. 
Observe that (Z) is a finite subgroup of the torus (R/Z)°, and that Bohr(S, p) con- 
tains the inverse image of the cube Q := {(ye)ses € RS : |yg| < o} C RS (which 
we identify with its projection in (R/Z)*) under ¢. 

Let Il C R be the lattice #(Z) + ZS. Though it is a slight abuse of notation, we 
consider ¢(Z) N Q to be the same as F N Q. Applying Lemma 3.36, we can find 
a progression P := (—L, L)- w for some linearly independent w),..., wa CT 
with O < d’ < d such that 


rnd. QcPcrno. 


Since the w; are independent, P is necessarily proper. The claim now follows by 
setting v; to be an arbitrary element of piw j) for each 1 < j < d’, and setting 
H equal to the kernel of ¢, which is of course just S+. 














In the case of a cyclic group, we can dispense with the group H and sharpen 
the constants somewhat (though at the cost of losing the first inclusion in (4.28)): 


Proposition 4.23 Let Z = Zy be acyclic group, and let Bohr(S, p) be a Bohr set 
of rank d withO < p < . Then Bohr(S, p) contains a symmetric proper progres- 
sion P of rank d and cardinality 
d 
p 
|P| > Fra : 


Furthermore we may choose P to be symmetric (i.e. P = — P). 


Proof The main tool here will be Minkowski’s second theorem. We use the 
standard bilinear form £ - x = €x/N, and write S = (&,..., Eq). Leta € R’ be 


4.4 Bohr sets 169 


the vector a := (È, PASET Sd) and let T be the lattice Z - œ + Zf; this clearly has 
full rank, and by (3.12) 


mes(R?/ T) = mes(R4/Z")/|P/Z4| > 1/N. 
Let Q be the cube 
Q := {(x1,...,%a) € R? : |x;| < p forall 1 < j <n}, 


and let 0 < 4; <--- < àq be the succesive minima of Q with respect to I, with 
a corresponding directional basis v4, ..., vg E IT as given by Theorem 3.30. In 
particular we see that every coordinate of v; has magnitude at most A j p. 

Let 1 < j < d be arbitrary. Since v; € I, we see from the definition of T 
that there exists w; € Zy such that vj € aw; + Zf. In particular we see that 
ll - w;llr;z < àjp for all 1 <i, j <d. Set w:=(uj,..., wag). Now we let 
Mj := Lagd and let M := (M,,..., Ma); we now claim that the progression 
P := (—M, M) - w is proper and lies in Bohr(S, p) (it is clearly symmetric). Let 
us first verify that P C Bohr(S, p). Ifn = (nı, ..., na) E€ (—M, M), then for any 
1 < j < d we have 


d d 
lé; - (n - w)lir/z < ys Inj Ilé; - wyllRyz < > hip =p 
j=1 IRER 

and hence nıwı +---+nqgwa € Bohr(S, o). This proves the inclusion P C 
Bohr(S, p). 

Now we show that P is proper. Suppose for contradiction that there exist distinct 
n,n' € (—M, M) such that n - w = n’ - w; setting i := n — n' € (—2M, 2M), we 
thus see that ñ - w = 0. In particular, (ñ - v); is an integer for each i. On the other 
hand, by arguing as before, we see that 


d d 
t 7 2 
IA- vil O layllEwj/N1 < D> —Ajp = 2p. 

— — dij; 

j=l j=l J 
Since p < 1/2, we conclude that (ñ - v); = 0 for alli, and thus 2 fijv; = 0. But 
this contradicts the linear independence of the directional basis v4, ..., vg. Thus 
P is proper. 

Finally, the cardinality of the proper progression P is 


1 
1 aA; 


m 


d 
|P| =| [em; -D> 
j=l 











and the claim follows from Minkowski’s second theorem. 





170 4 Fourier-analytic methods 


One undesirable feature of Bohr sets of large rank d is that they have large 
doubling constant: (4.26) suggests that Bohr(S, o) + Bohr(S, o) can be 4 times 
larger than Bohr(S, p). A useful observation of Bourgain [39] is that if one con- 
siders an imbalanced sum Bohr(S, o) + Bohr(S, p’), with o’ much smaller than 
p, then it is still possible for Bohr(S, 0) + Bohr(S, p’) to be close to Bohr(S, p). 
This intuition is formalized by the notion of a regular Bohr set. 


Definition 4.24 (Regular Bohr sets) A Bohr set Bohr(S, p) of rank d is said to 
be regular if one has the estimate 


(1 — 100d|x|)Pz(Bohr(S, p)) < Pz(Bohr(S, (1 + k)p)) 
< (1 + 100d|«|)Pz(Bohr(S, p)) 
whenever |x| < wr: 
Not all Bohr sets are regular. However, it turns out that every Bohr set is “close” 
to a regular one: 


Lemma 4.25 (Regular Bohr sets are ubiquitious) /39] Let S be a non-empty 
additive set and let 0 < € < 1. Then there exists p € [€, 2e] such that Bohr(S, p) 
is regular. 


Proof Let f :[0,1]—R be the function f(a) := 1 log, Pz(Bohr(S, 2%¢)). 
Observe that f is non-decreasing in a, and from Lemma 4.20 we have f(1) — 
f(0) < log, 5. 

Suppose we could find 0.1 < a < 0.9 such that | f(a’) — f(a)| < 20|a — a’| for 
all |a| < 0.1. Then it is easy to see that Bohr(S, 2“¢) is regular. Thus, it suffices to 
obtain an a with this property. This can be done directly from the Hardy—Littlewood 
maximal inequality (applied to the Lebesgue-—Stieltjes measure d f ), or as follows. 
If no such a exists, then for every 0.1 < a < 0.9 there exists a real interval J of 
length at most 0.1 and with one endpoint equal to a, such that f af > f 1 20 dx. 
These intervals cover {a : 0.1 < a < 0.9}, which has measure 0.8. By the Vitali 
covering lemma (see exercises), one can find thus find a finite subcollection of 
disjoint intervals [;,..., 7, of total length |/;|+---+|J,| => 0.8/5 (say). But 
then we have 


: 2 z 0.8 
log, 5 = f apo f af = Y f r0ax> = x 20, 
0 i=l “li i=l "li 


a contradiction. 














We shall make a crucial use of this lemma in proving Bourgain’s quantitative 
version of Roth’s theorem in Section 10.4. 


4.4 Bohr sets 171 


Exercises 


4.4.1 


4.4.2 


4.4.3 


4.4.4 


4.4.5 


4.4.6 


4.4.7 


4.4.8 


Show that if 0<p<1/6 and |S|>1, then |1pons.o)() = 
$Pz (Bohr(S, po)) for all € € S. In particular Bohr sets are extremely non- 
uniform: |/Bohr(S, p)llu = +P7(Bohr(S, p)). By applying Plancherel’s 
theorem, conclude the additional bound Pz(Bohr(S, p)) < rs? 

Give examples to show that the density Pz(Bohr(S, o)) of a Bohr set can 
be as low as @(p)!*!, and as large as @(1/|S|), even when p is small and 
| S| is large. Thus the bounds in (4.25) and the preceding exercise cannot 
be significantly improved. 

Establish the bound Pz(Bohr(S, ko)) < O(k)!*'Pz(Bohr(S, p)) for any 
k > 1. Using the Ruzsa covering lemma (Lemma 2.14), conclude that one 
can cover Bohr(S, kp) by O(k)!*! translates of Bohr(S, o). In particular, 
in the notation of Definition 2.25, Bohr(S, p) is a O(1)!5!-approximate 
group. 

In the setting of Lemma 4.22, show that Bohr(S, p) can be covered by 
O(d)” translates of P + H. 

Show that a Bohr set Bohr(S, o) of rank d always contains an arith- 
metic progression of length @(|Bohr(S, p)|'/“) and non-zero step size. 
(Hint: if |Bohr(S, p)|'/“ is large, use the preceding exercise to show 
that Bohr(S, o/k) contains a non-zero element for some integer k = 
@(|Bohr(S, p)|'“).) 

[160] Let A be an additive set in Z that contains 0. Show that there exists a 
set S of frequencies with |S] < 1 + log, |A| such that A N Bohr(S, V2) = 
{0}. (Hint: choose 1 + |log, |A|] frequencies randomly and indepen- 
dently (allowing for collisions) and use the first moment method.) 
(Vitali covering lemma) Let Z be a finite collection of intervals in the real 
line. Show that there exist a subcollection 71, ..., J, of these intervals 
whose interiors are disjoint, and such that )~"_, |Ji| > imes(J rez 1): 
(Hint: use a greedy algorithm, picking the largest intervals first.) By being 
more sophisticated in the argument, lower l to h. (Hint: eliminate nested 
intervals, and then move greedily from left to right to cover |J;ez Z by 
two families of interior-disjoint intervals.) 

(Hardy—Littlewood maximal inequality) Let u be a non-negative finite 
measure on the real line, and let M u denote the Hardy—Littlewood max- 
imal function M u(x) := sup,.o xuty :x—r<y<x-+r7}. (It can be 
verified that Mu is a measurable function.) Using the Vitali covering 
lemma, establish the distributional inequality 


mes({x : Mu(x) > A}) < uR, 


172 4 Fourier-analytic methods 


4.5 A(p) constants, B,[g] sets, and dissociated sets 


In Section 4.3 we discussed one Fourier-analytic characteristic of an additive set 
A in a finite additive group Z, namely its linear bias. In this section we discuss a 
rather different characteristic, namely the A(p) constants of a set S of frequencies. 
These constants measure how “dissociated” or “Sidon-like” a set! S is; in more 
practical terms, the A(p) constants quantify the independence of the characters 
associated to S in a certain L? (Z) sense. These constants can be used to obtain 
precise control on the arithmetic structure of S, for instance in controlling iterated 
sum sets of S. One feature of these constants is that they are stable under passage 
to subsets, thus A(p) constants will also control iterated sum sets of subsets S’ of 
S. This stability (which is not present in the Fourier bias, unless one takes random 
subsets as in Lemma 4.16) is useful for a number of applications. 
We begin with the formal definition of the A(p) constants. 


Definition 4.26 (A(p) constants) Let S be an additive set in a finite? additive 
group Z, and let 2 < p < oo. We define the A(p) constant of S, denoted ||S|| cp), 
to be the best constant such that the inequality 


X Eet- x) < [Sil acpllellcs) (4.30) 
EES LP(Z) 
holds for all sequences c : S —> C of complex numbers. 


One can easily establish the bound 
[Sli < IS", (4.31) 


for2 < p < œ, with equality at the endpoints p = 2, 00; see Exercise 4.5.6. This 
exercise indicates that largeness of A(p) constants is correlated to strong additive 
structure of S. At the other extreme, we now show that smallness of A (p) constants 
is correlated to strong lack of additive structure of S. 


Definition 4.27 (B, sets) Let h > 2. A non-empty subset $ of an additive group 
Z is a By set if for any &1, ... , Eh, N1, ---, Nh € S, one has £1 +---+&, =m + 
--- + np, if and only if (&, ..., En) is a permutation of (71, ..., na). We say Sis a 
Sidon set if it is a B2 set. 


These sets are the g = 1 version of the B, [g] sets, encountered in Section 1.7.1; 
Sidon sets were also briefly mentioned in Section 2.2. Note that we do not bother 
with the notion of a B; set, since every set is trivially a B, set. 


1 Here, we use “Sidon set” to denote a set whose pairwise sums are all disjoint. There is another, more 
Fourier-analytic, notion of a Sidon set related to A(p) constants which we will not discuss here. 

2 One can also define the concept of a A(p) constant for subsets of the integers, or more general 
additive groups, but we will not need to do so in this book. 


4.5 A(p) constants, B [g] sets, and dissociated sets 173 


Example 4.28 For any M > 1, the set S := {0} U (M^N) = {0, 1, M, M?,...} is 
a Bn set in Z if and only if h < M. In particular, the powers of 2 form a Sidon set. 
One can of course truncate these examples to finite additive groups such as Zy; 
note that any non-empty subset of a B} set is also a By, set. 


Proposition 4.29 Let S be a non-empty subset of a finite additive group Z. Then 
we have 


1 \ 1⁄4 
I|Sll aca = (2 = 5) f (4.32) 


with equality holding if and only if S is a Sidon set. More generally, if h > 1, then 
there exists a number 1 < ath, |S|) < (h!)'/2" depending on h and |S| such that 
ISl aca = ah, |S|) when S is a By set, and || S| aen > ah, |S|) otherwise. 


Proof We first prove (4.32). By testing (4.31) with cg identically equal to 1, it 


will suffice to show that 
1 2 
X e(x, &) =(2-— JIS/. 
i [S| 
L4(Z) 


4 














EES 


The left-hand side can be expanded as 


Yo Eeze(&i +6- m- m): x). 


&1,6.m,.meS 


By Lemma 4.5 this simplifies to 


Héi 6, n1, m € S: E1 +6 =m +m}. 


Clearly & + & will equal nı + n2 when (é1, £2) is a permutation of (71, n2), so 
this expression is at least as large as 


1 
1 = |SS] — 1)2 + |S] = (2- 5) |S? 
&1,8,m.n2€S:{1,€2}={m1,.02} 


as claimed. Note that this argument also shows that the inequality in (4.32) is strict 
if S is not a Sidon set, since then we have additional terms coming from pairs 
(€1, 2) and (71, 72) which are not permutations of each other. 

Now suppose that S is a Sidon set. To prove equality in (4.32) it suffices to 
show that 
4 


1 
ee 
[S| 


X cea, £) 


EES 














LA(Z) 


174 4 Fourier-analytic methods 


assuming the normalization >> tes ICE |? = 1. The left-hand side can be expanded 
as 


Yo cece Gm Eveze((E + & — Mm — m): x) 
&,€,m1,m€S 
which as before simplifies to 
CE CEC Cry - 
1,2,01, Mm ES:E1 +E2=N1 +2 


Since S is a Sidon set, (71, 72) must be a permutation of (n1, n2). Splitting into the 
cases & = & and &; Æ &, we can thus rewrite the previous expression as 


4 2). 12 
X lelg +2 $O Ice Iles 
EES 1,82 €S:81 Abo 

which by the normalization )~ ges |C |? = 1 can be written as 


2-9 lcg. 


EES 


But from Cauchy-Schwarz and the normalization Ye es lce? = 1 we have 
ees Ice |? > 1/|S|, and the claim follows. 
The general case h > 2 is similar but is left to Exercise 4.5.9. 














Another quantification of the heuristic that large A(p) constants corresponds 
to strong additive structure is given by 


Lemma 4.30 Let S be a non-empty subset of a finite additive group Z, and let 
h > 1. Then we have 
S h 
iS hs] > — 


2h 
ISIS 


whenever h,, hy > 0 are such that h, + hy = h. In particular we have 


h 
nee 
RA 
Remark 4.31 This lemma shows that if S has a small A(2h) constant, then not 
only do the sum sets AS become very large, but so do the sum sets AS’ of all subsets 
S’ of S, thanks to the monotonicity of A(p) constants. The converse statement is 
also true up to logarithmic factors; see exercises. Thus A(2/) constants measure 
the failure of S, or any of its subsets, to have good closure properties under h-fold 


sums. 


4.5 A(p) constants, B [g] sets, and dissociated sets 175 


Proof From (4.30) with p := 2h, and c¢ set identically equal to 1, we have 


die + x) 


EES 


2h 
2h h 
< WSIhKew lS- 


L2h (Z) 














The left-hand side is equal to 


bs e. ») (x e(t a) 


EES ée-S LZ) 


since e(x, —&) is the conjugate of e(x, €). We can expand 


hy ha 
(Ze) (Ze) = J rn me - x) 


EES Ee-S EES 


where rh,,h, 1s the counting function 


Hite) : = HEr -s Ems Saas Eh) € SP eee 
= Bite + bi Bf Bi 
By (4.2) we thus have 


ree < Shoals. 
EES 
On the other hand, the function r;,,, is supported in hı S — h2S, so by Cauchy— 
Schwarz 
XO fnm E) < hS = MSPS ons. 
EES 
But from the definition of r;, ,, we have 


XO fhm CVSS lS sete 
EES 











The claim follows. 





We now investigate the A (p) constants of Sidon-like sets as p > ov. 


Definition 4.32 An additive set S with cardinality | S| = d is said to be dissociated 
if the cube [0, 1] - S is proper, or in other words, the 2% subset sums 


FS(S):= [Deses] 


EES 


are all distinct. 


176 4 Fourier-analytic methods 


This should be compared with the concept of a Sidon set, which is a set S 
of cardinality d whose dar) pairwise sums {& + & : &, & € S} are all distinct 
(except for the trivial identification &; + & = & + &). A good example of a dis- 
sociated set is the set of powers of 2: S = {1, 2, .. . , 2” } in any cyclic group Z/NZ 
with N > 2”t!. Observe that if S is a dissociated set of cardinality d, and v is a 
non-zero element of [—1, 1]“, then v- S Æ 0 (since otherwise we could find two 
disjoint sets $1, S2 in S, corresponding to where the components of v are +1 or 
—1, such that J 'ses, & = Dees, £). 

Dissociativity is the Fourier analog of joint independence. It leads to the fol- 
lowing Fourier-analytic analog of Chernoff’s inequality: 


Lemma 4.33 (Rudin’s inequality) Zf S is dissociated, then we have 








E,<z exp (one > c(é)e(E - ») < e”) (4.33) 
EES 
whenever ||c||2(s) < 1 ando > 0. We also have the distributional estimates 
Prez | Po cee + x)] = | =O) (4.34) 
EES 
for every € > 0, and the A(p) estimate 
ISllaw = OVP) (4.35) 


forall2 < p < œ. 


Note that when p = 2h then (A!)!/* is comparable to ./P by Stirling’s formula 
(1.52), and hence so (4.35) and shows that dissociated sets are comparable in A(2h) 
constant to Bz, sets for any given h (if S is sufficiently large). This also shows 
that the bounds in the above lemma cannot be significantly improved except in 
the constants, even if one imposes even more additive independence conditions 
on S. 


Proof Write c(&) = |c(&)|e(6;) for some phase 6; € R/Z. We begin by observing 
the inequality 


e™ < cosh(x) + t sinh(x) 


for all x > 0 and —1 < t < 1, which is simply a consequence of the convexity of 
e™ as a function of t. In particular we see that 


exp(oRec(§ )e(x, €)) < cosh(o|c(§)|) + sinh(o|c(€)|)Re e(§ -x + 0s), 


4.5 A(p) constants, B [g] sets, and dissociated sets 177 


which upon multiplying and taking expectations becomes 


Eyez exp (- J Reclé etx, 9) 


EES 


1 
<Eez] | (oshole + 5 sinhlo |e elg - x + 0g) 


EES 
1 
t3 sinh(o |c(&)|)e(—& : x — 8). 


Now we multiply the product out and inspect its behavior in x. We obtain a 
large number of terms (3!5!, to be exact) that are of the form e((v - S) - £), for 
some v €[—1, 1]!5!, times some constant independent of x, where we select 
some enumeration S = (&,..., sı) of S. There is one constant term, namely 
Tee s cosh(a |c(&)|), but all the others have a non-zero frequency vector v- S 
because S is dissociated, and thus integrate out to zero by the Fourier inversion 
formula. Thus we have 


Esez exp (- J Rece - ») < | [cosh le))), 
EES EES 

and the claim (4.33) then follows from the elementary inequality cosh(x) < er /? 

(which follows by comparing Taylor series). From Markov’s inequality we thus 


obtain 


Prez (rezac x) > ) er ees 


EES 


for every A > 0; choosing o := 4/2, we obtain 


Prez (reZ eoe -x)> 7 < eh, 


EES 


Replacing à by (1 — £)à and rotating c(&) by an arbitrary angle e(@), we obtain 


Prez (Rew Yi c&eE +x) = = e) Se 
EES 

If take the union of these estimates as e’? 

(depending on £) we obtain (4.34). 


To obtain (4.35), we observe from the identity 


o0 
= pf APTIP,ez 
0 


varies over a finite number of angles 


=a) dÀ 


P 


Voc e(& + x) 


EES 


Yo cO eE- x) 


EES 




















L?(Z) 


178 


4 Fourier-analytic methods 


and (4.34) (with € = 1, say) that 


p 5o 
Ycee- =O (r Í pede MIS a) 
0 


EES 














L?(Z) 


To estimate the integral, we observe from elementary calculus that the integrand 
aP-le-*/5 is bounded by O(p)?”? for A = O(./p), and then decays exponentially 
for à >> ./p. From this we can easily bound the integrand by p? O(p)?/”, and 


the claim follows (note that p!/? is bounded by e). 














In the next few sections we shall use Rudin’s inequality to obtain structural 
control on various sets of frequencies. 


Exercises 


4.5.1 


Show that the A(p) constant of a set S does not depend on the choice of 
bilinear form used to define the Fourier transform, and is also invariant 
under translations or isomorphisms of the set S. 

For any 2 < p < œ and any disjoint S1, S2, show the triangle inequality 
ISIlacy < Silla + IS2llacp) whenever SC S1 U Sp. 

Let £ be the uniform distribution on {—1, 1}, and let £1, .. . , ey be inde- 
pendent trials of £. If c),...,cy are arbitrary complex numbers and 
2 < p < œ, prove Bernstein’s inequality [25] 


N 1/2 N P\ 1/P 
j=l j=l 


N 1/2 
<O| VP (> i? 
j=l 


(Hint: for the lower bound, compute the p = 2 moment. For the upper 
bound, modify the proof of Lemma 4.33; alternatively, apply Lemma 4.33 
to the group Z = ZY, where S is the standard basis for ZXY.) Conclude 
that if fi, ..., fy are any complex-valued functions on Z, then we have 








Khintchine’s inequality 


N 1/2 
(> iit) <E 
j=l 


LP(Z) 


p 1/p 














N 
Yo eyfi 
j=l 


L?(Z) 


N 1/2 
<O| VP (Sise) 
j=1 


L?(Z) 


4.5.8 
4.5.9 
4.5.10 


4.5 A(p) constants, B [g] sets, and dissociated sets 179 


Let f : Zi x Z2 > C be a function on two variables in two non- 
empty finite sets Z1, Z2, and let 2 < p < ow. Establish the Minkowski 
inequality 


(Eyez Erezi f Œ, DPPP)? < (Erez, Byers fx, ye?) 
(4.36) 
(Hint: use the triangle inequality for the L”/? norm.) Conclude that 
I| S| acp) is the best constant such that 


1/2 
| < [Slaw (x ie 
LP(Z) 


EES 

for all finite-dimensional Hilbert spaces H and all sequences (c(&))zes 
taking values in H. Using this, conclude that ||S1 x S2|lacp) = 
Si llacpyllS2llacpy whenever S|, S2 are additive sets in finite additive 
groups Z1, Z2 and 2 < p < œ. 

[33], [20] Letn > 1beaninteger, let Z := Z3. Foré = (&,...,&) € Z}, 
let |E | denote the number of coefficients &1, . . . , & which are equal to one. 
Establish the Bonami—Beckner inequality 


X ellet) 


EEZ 


X cE e(x, £) 


EES 























H 


< llel 
Lit/e? (Z) 


for all 0 < € < 1 and all c € /°(Z). (Hint: first establish this by hand 
for n = 1, and then exploit (4.36) to obtain the general case.) Conclude 
in particular that if S := {Ẹ € Z5 : |&| = k}, then ||S¢|lacp) < (p — 1? 
forall2 < p < œ. 

Let 2 < p < œ, and let S be a non-empty subset of Z. Prove (4.31). 
(Hint: use the Hausdorff—Young inequality.) If 2 < p < oo, show that 
equality occurs if and only if S is a translate of a subgroup of Z. (You 
may need Exercise 4.2.9.) 

Let S be an additive set in a finite additive group. Show that 














ISllaq) = min (1, Z7"? S1") 


forall2 < p < œ. It turns out that these bounds are essentially sharp for 
randomly chosen sets S in Z of a fixed cardinality: see [35]. 

Let S be a B, set in a finite additive group Z. Show that |S] < |Z|!/". 
Complete the proof of Proposition 4.29. 

Let S be an additive subset of Z. Show that E(S, S) < ISl lS; 
thus the additive energy of an additive set is controlled by its A(4) 
constant. 


180 


4.5.11 


4.5.12 


4 Fourier-analytic methods 


Let S be an additive set, and let h > 1. Suppose that A > 0 is a constant 
such that 
Dik 
d 
|AS| 2 Azh 
for all non-empty subsets S” of S. Show that 
IISll Any = OCC + log |S); 


thus Lemma 4.30 can be reversed after conceding a factor of a logarithm. 
(Hint: first verify the estimate (4.30) when c is a characteristic function 
by reversing the proof of Lemma 4.30. For general c, decompose c into at 
most O(1 + log|S|) functions which are comparable to constant multi- 
ples of characteristic functions, by partitioning the range of c using powers 
of 2, and discarding those values of c smaller than (say) |S|~!||c|2.) 
[251] Show that || S|] acp) is the best constant such that 


II Fllecsy < ISl llf lz 


for all random variables f, where p’ is the dual exponent to p, thus 
1/p + 1/p' = 1. Next, write 

isi 
IZ] 





IF lizz + Exyezf@) FOU yY eE E y) 


EES 


Ay2 
IFz = 
and observe the inequalities 


Ex yez f(x)g(yMa # y) ) eE- a- y) 


EES 


< Ifl lellzzz 








and 


E, yez f8 ONE # y) eE- (x — y)) 


EES 


< [ZIS f zo lalz. 








Using Riesz—Thorin interpolation (or arguing as in Exercise 4.2.3) con- 
clude that 


E, yez f(x)g(yIa # y) X eE- (x — y) 


EES 
< (ZMS h)? 
< (ZS hu lfl lelez 


From this, conclude the Tomas—Stein inequality 








—2 —2 
ISlic < SIZI? + Sla ZD? 


(compare with (4.31)). Thus, Fourier-uniform sets tend to have fairly 
small A(p) constants. See also Lemma 10.22. 


4.6 The spectrum of an additive set 181 


4.6 The spectrum of an additive set 


We now use Fourier analysis to investigate the spectral properties of additive sets 
A which have high additive energy E(A, A); examples of such sets include sets 
with small sum set |A + A| or small difference set |A — A| (cf. (2.8)). One can 
already conclude from estimates such as (4.23) that such sets must be highly non- 
uniform, i.e. 14 contains non-trivial Fourier coefficients. However, this by itself is 
not the strongest Fourier-analytic statement one can say about such sets. In order to 
proceed further it is convenient to introduce the notion of the a-spectrum of a set. 


Definition 4.34 (Spectrum) Let A be an additive set in a finite additive group Z 
with a non-degenerate symmetric bilinear form - and let a € R be a parameter. We 
define the a-spectrum Spec,(A) C Z to be the set 


Spec, (A) := {£ € Z : |14()| = @Pz(A)}. 


One could define this spectrum without the assistance of the bilinear form -, but 
then it would be a subset of the Pontryagin dual group Ê rather than Z. 

From Lemma 4.9 we see that the sets Spec, (A) are symmetric, decreasing in a, 
empty fora > 1, contain the origin fora < 1, and are the whole space Z whenever 
a < 0. Thus the spectrum is really only an interesting concept when 0 < a < 1. 
In the extreme case a = | the spectrum becomes a group, see Exercise 4.6.2. 

From (4.16) (and Markov’s inequality) we observe the upper bound 


|Specy(A)| < a */Pz(A) (4.37) 


on the cardinality of the a-spectrum. In fact we can use Rudin’s inequality to 
obtain a more precise structural statement, in which the polynomial loss in Pz(A) 
is replaced with a logarithmic loss. To prove this statement, we first need an easy 
lemma (cf. Corollary 1.42). 


Lemma 4.35 (Cube covering lemma) [36] Let S be an additive set in an ambi- 
ent group Z, and let d > 1 be an integer. Then we can partition S = D U---U 


Dy, U R where Dj,..., Dg are disjoint dissociated subsets of S of cardinality 
d +1, and the remainder set R is contained in a cube [—1, 1]*-(nq,..., na) for 
some n,.--, Na E Z. 


Proof We use the greedy algorithm. We initially set k = 0. If we can find a 
dissociated subset D of S of cardinality d + 1, we remove it from S and add it to 
the collection D4, ..., Dg, thus incrementing k + 1. We continue in this manner 
until we are left with a remainder R where all dissociated subsets of S have 
cardinality d or less. Let {ņ1, . . . , na} be a dissociated subset of R with maximal 
cardinality; thus d’ < d. Observe that if R contained an element € which was not 
contained in [—1, 1J” - (M1, ---, Na), then {71,..., Nar, E} would be dissociated, 


182 4 Fourier-analytic methods 


so contradicting maximality of d’. Thus we have R C [—1, 1]“ - (m,..., 14’), 
and the claim follows (padding out the progression with some dummy elements 
Nd'+1, -- -, Na if necessary). 














Lemma 4.36 (Fourier concentration lemma) /48] Let A be an additive set in 
a finite additive group Z, and let O < a < 1. Then there exist d = O(& 7? (1 + 
log Pao) and frequencies nı, ..., Na E€ Z such that 


Spec,(A) © [-1, 1]? - (m, ---, Na). 


This result is essentially sharp in a number of ways; see [146]. 


Proof It will suffice to show that for each phase 0 € R/Z, the set 
Sy = [E € Z : Re e0) DE) = ZPA} 


can be contained in a progression of the desired form, since from Definition 4.34 
we see that Spec, (A) is contained in the union of a bounded number of the Sg, and 
we can simply add all the progressions together (here the fact that we have a/2 
instead of @ in the definition of Sg is critical). 

Fix 6. By Lemma 4.35, it will suffice to show that 





ws) 
P7(A) 
for all dissociated sets S” in Sg. But if S’ € Sp, then by definition of Sọ 


Re e(0) ) OIE) = ZPAS. 


EEZ 


IS] < Ca (: + log 


Let f(x) := wn eee s e(x, €) be the normalized inverse Fourier transform of 
ly; then by (4.3) the left-hand side is equal to Re e(6)| S|? Ez1a, f. Thus we 
have 


Ezlalfl> SP2(Ais'|'”. 
The left-hand side can be rewritten as 


Ezlalfl = i Pyez(x € A; |f) = A) dà, 
0 


cf. (1.6). To bound P,<z(x € A;|f(x)| > A), we can either use the trivial bound 
of Pz(A) or use (4.34) to obtain a bound of C e VIS (for instance). Thus we have 


Xr 
i min (Pz(A), Ce”) da > SPASI". 
0 


The left-hand side is at most CPz(A)(1 + log!” Pao) and the claim follows. 














4.6 The spectrum of an additive set 183 


The above lemma suggests that the spectrum has some additive structure. This 
is confirmed by the following closure properties of the w-spectrum under addition: 


Lemma 4.37 Let A be an additive set in an finite additive group Z, and let £, &' > 
0. Then we have 


Spec;_,(A) + Spec;_..(A) © Speci zepen: (4.38) 
In a similar spirit, for any 0 < a < 1 and for any non-empty S C Spec,(A) we 
have 
C 
|{(E1, &) E€ S x S : & — & € Specya/9(A)}| > zS (4.39) 


See Exercise 4.6.2 for the € = 0 case of this lemma. This lemma should be 
compared with Lemma 2.33. Indeed there is a strong analogy between the spectra 
Spec,(A) and the symmetry sets Sym,(A), which are heuristically dual to each 
other. 


Proof We first prove (4.38). Let £ € Spec,_, and Ẹ € Spec,_,,, then there exists 
phases 0, 0’ € R/Z such that 


Re E,eze(& -x + @)14Qx) = (1 — €)Pz(A); 
Re Eyeze(é’ -x + ’)1 a(x) = (1 — €’)Pz(A). 
Since Re E,ez1,4 = Pz(A), we thus have 
Re Eyez[2e(& - x + 0) + 2e(&’ -x + 6") — 3] 1 a(x) = 1 — 2(e + €’))Pz(A). 


To conclude that & + &’ € Spec; _».4,.7)(A), it will thus suffice to establish the 
pointwise estimate 


Re [2e( - x + 6) + 2e(&’ -x +6’) — 3] < Re [el e(x, E +E]. 


Writing e(l -x +0) = e'f and elé’ -x + 6’) =e? for some —m/2 < B, B' < 
—m/2, we reduce to showing 


2 cos(B) + 2cos(p’) — 3 > cos(B + p’). 


But by the convexity of cos between —x /2 and 2/2, we have 





2 


$ 2 - 2 
= 2e0s (HF) 1 2(1 cos (ESP) 
2 2; 


> cos(B + £’) 


2cos(B) + 2cos(8^) — 3 > 4 cos (f = £) —3 





as desired. 


184 4 Fourier-analytic methods 


Now we prove (4.39), which is due to Bourgain [41]. Set a(&) := sen(14()) 
for € € S; thus 


Erez ) aée(E-x)1a(x) = D> lia] = aP2(A)|S|. 
ES EES 
Applying Cauchy—Schwarz, we conclude 
2 
14) > a?Pz(A)|S)?. 


Erez |J a(&)e(& + x) 


EES 








But the left-hand side can be rearranged as 


XO aala — &), 


&1,82€8 
so by the triangle inequality we have 
DE Hai = $l z als. 
1,82€8 
In particular (cf. Exercise 1.1.4) 
[Tai — £2) =o? /2)S/? 
§1,62€S:§1 —&2 €Spec,2 (A) 


and (4.39) follows. 














We now show that small sum sets force large spectra (cf. Exercise 4.3.9, or 
Exercise 4.6.3 below). 


Lemma 4.38 Let A be an additive set in an finite additive group Z, and let 0 < 
a < 1. For any integers n,m > 0 with (n, m) 4 (0, 0), we have the lower bound 
on sum Sets 

|A| 
|Spec,(A)|Pz(A) + arti)?" 








|nA —mA| > 


Proof We may take n,m > 0. Consider the function f = 14 *---* 14 * 1_, * 
-++% 1_, formed by convolving n copies of A and m copies of —A. Then f is 
non-negative and supported on nA — mA, and thus 


Ez f < Pz(nA — mA) P Œz fP. 
From (4.10) we have Ez f = Pz(A)" +". From (4.9) and (4.17) we have f = 
PER . Combining these inequalities with (4.2) we see that 
|Z|Pz(4) 0 


|InA—mA| > RRT ; 
Deez ER 





4.6 The spectrum of an additive set 185 


But 
SUG. So Paes 
EEZ &eSpec,(A) 
+ gp (Ayer | RE 
& ¢Spec,(A) 


< Pz(A)"*” |Spec, (A)| + o2tm)—2p (Aye 











and the claim follows. 





Now we consider the following inverse-type question: if A has additive structure 
in the sense that its energy E(A, A) is large or its difference set |A — A| is small, 
is it possible to approximate A (or a closely related set) by a Bohr set? We give two 
results of this type, one which places a relatively large Bohr set inside 2A — 2A, 
and another which places A — A inside a relatively small Bohr set. We begin with 
the former result, the main idea of which dates back to Bogolyubov. 


Proposition 4.39 /295] Let 0 < a < 1, and let A be an additive set in a finite 
additive group Z such that E(A, A) > 4a?|A|?. Then we have the inclusion 
1 
Bohr (speca), 5) C 2A —2A. (4.40) 


Proof Letx be any element of the Bohr set Bohr(Spec, (A), }), thus Re e(é - x) > 
5 for all € € Spec,(A). To show that x € 2A — 2A, it would suffice to show that 
1, * 1,4 * 1_4 * 1_4(x) 4 0. But from (4.4), (4.9), (4.17) we have 


la #14 la xla = Do Late - x). 


EEZ 


Now take real parts of both sides and use the hypothesis on x to obtain 


Inela*laxla@)= J ME Reeg o+ JO 1E Re et, &) 








&ESpecg (A) EgSpeca (A) 
1 Koei S 

z3 NAG i De LO 
& eSpecy (A) E gSpecg (A) 
ERA EN 

= 9 DN) k DAG] 

#Specg (A) 

TEAN Bree Sea RG 

ae S P,(Ay|i 

= 3 ZR 2 2" AY is) 
PEA, A Sak 05 

a — <a? PZ(A 

= ge 3° z(A) 

>0 











as desired, where we have used the hypothesis on @ in the last step. 





186 4 Fourier-analytic methods 


Now we give a converse inclusion, which applies to sets of small difference 
constant 5[A] but requires the spectral threshold to be very large. 


Proposition 4.40 Let K > 1. If A is an additive set in a finite additive group Z 
such that |A — A| < K|A| (ie. 5[A] < K) and0 < e < 1, then 


A — A C Bohr(Spec,_,(A — A), V8eK). 


Proof Letx, y € Aandé € Spec,_,(A — A). Then there exists a phase 0 € R/Z 
such that 


Re J` e(&-x+6)>(1-e)|A-Al 
zeEA—A 
and hence 
5 (1 — Re e(& -x + 0)) < |A — A| < eK]|Al]. 
zEA—A 


Since the summand is non-negative, and A — A contains both x — a and y — a, 
we thus have 


XC I1 —Re e(& - (x — a) + 0)| < £K|A| 
acA 
and hence by Cauchy—Schwarz 
Dol Re et - (œ — a) +9)! < eK IAI, 
acA 


From the elementary identity 
|1 — e(a)| = V2|1 — Re e(a)|!” 
we conclude that 


PLII- eG - @ =a) + 0)| < Ve? KA], 
acA 
Similarly for x replaced by y. By the triangle inequality we conclude that 
X le& + — a) + 9) — e(& (x — a) + 0)| < V228"? K'P]A]. 


acA 


But the left-hand side is just | Ale(é - (x — y)); thus 
leé-(x—y))- 1] < V8eK. 














Since £ € Spec,_,(A — A) was arbitrary, the claim follows from (4.24). 


In the next chapter we apply these propositions, together with the additive 
geometry results from Chapter 3, to obtain Freiman-type theorems in finite additive 


4.6 The spectrum of an additive set 187 


groups. For now, we shall give one striking application of the above machinery, 
namely the following Gauss sum estimate of Bourgain and Konyagin: 


Theorem 4.41 [44] Let F = F, be a finite field of prime order, and let H be a 
multiplicative subgroup of F such that |H| > p° for some 0 < 6 < 1. Then, if p 
is sufficiently large depending on ô, we have ||H ||, < p ° for some £ = &(6) > 0. 
In other words, we have 


sup 
&€Z,\0 


<p *|Al. 


Š eat) 


xeH 








Proof We may use the standard bilinear form £ - x = x&/p. Since h - H = H 
for all h € H, we easily verify that 1y(h~'€) = 14(€) for all h € H and £ € Z. 
This implies in particular that Spec, (H) = H - Spec,(#). Thus each Spec, (H) 
consists of multiplicative cosets of H, together with the origin 0. 

We use an iteration and pigeonhole argument, similar to that used to prove 
Theorem 2.35. Let J = J (ô) > 1 be a large integer to be chosen later, and let 
€ = (J, ô) > 0 be a small number also to be chosen later. Define the sequence 
1>a,>--->aj4, > 0 by setting a, := p™° and aj4) := a? /2. Suppose for 
contradiction that ||H ||, > p~°; then Spec, (H) contains a non-zero element, 
and hence by the preceding discussion |Spec,,(H)| > |H| +1 > p> +1. Since 
Spec, (H ) is increasing in j, we see from the pigeonhole principle that there 
exists 1 < j < J such that 


|Spec,,,H)| < p™” Spec, (H)]. 


Oj+1 
On the other hand, from Lemma 4.37 we have 


2 


IEn, ë) € Spec, (H) x Speca, (H) : &1 ~& € Speca, (A= H Specu, (D. 


j+ 
Applying Cauchy—Schwarz or Lemma 2.30 we conclude that 
E(Spec, (H), Spec, (H)) = Qy (p70 °C Spec, (H)I’). 
If we let A := Spec, (H)\{0}, we thus obtain 
E(A, A) = Qy (p72 0U/) | Af) 


since |A| > pê, J is large enough depending on 6, and £ small enough depending 
on J, ô. But A is a union of cosets x - H of H for various x € F’,\{O}. Applying 
Exercise 2.3.20 


E(A, x» H) = Qy (p7 ™®-°0 | AIL’). 


188 


Dilating this by x7 


4 Fourier-analytic methods 


! we obtain 


E(x"! - A, H) = Q;(p7O-OC/ | AIL’). 


But this will contradict Corollary 2.62 if J is sufficiently large depending on 4, 


and e sufficiently small. 














In [40] this result was extended (using slightly different arguments) to the case 
where H was not a multiplicative subgroup, but merely had small multiplicative 
doubling, for instance |H - H| < p*|H|. In [41] the result was further extended to 
the case where the field F, was replaced by a commutative ring such as F, x Fp 
(with Theorem 2.63 playing a key role in the latter result). This yields some 
estimates on exponential sums related to the Diffie-Hellman distribution and to 
Mordell sums; see [40], [41] for further discussion. 


Exercises 


4.6.1 


4.6.2 


4.6.3 


4.6.4 


4.6.5 


Let A be an additive set in a finite additive group Z and let a € R. 
Show that A, —A, and T*A all have the same spectrum for any h € 
Z; thus Spec, (A) = Spec,(—A) = Spec(T" A). If : Z — Z is a group 
isomorphism of Z, show that Spec, (@(A)) = ' (Spec, (A)), where gi is 
the adjoint of ¢, defined in Exercise 4.1.8. 

Let A be an additive set in Z. Show that the spectrum Spec,(A) is a 
group and is in fact equal to (A — A)+, the orthogonal complement of the 
group generated by A — A. Also, recall that Symp(A) := {hE A: A+ 
h = A} is the symmetry group of A; show that the orthogonal complement 
Symo(A)*+ of this group is the smallest group which contains the Spec, (A) 
for alla > 0. 

Let A be an additive set in an finite additive group Z, and let 0 <a < 1. 
Establish the inequalities 


E(A, A) 
|A]? 
Thus, large energy forces large spectrum (and conversely). 
Let 0 < œ < 1, and let A, B be additive sets in Z with |A| = |B| = N 
and E(A, B) > 4a? N?. Show that |Spec, (A) N Spec, (B)| > 20° |Z Thus 
pairs of sets with large additive energy must necessary have a large amount 
of shared spectrum. 
If A is an additive set in a finite additive group Z, and A’ is an addi- 
tive set in a finite additive group Z’, show that Spec,(A) x Spec gA" VC 
Specyg(A x A’) for all 0 < a, B < 1, where we give Z x Z’ the bilinear 
form induced from Z and Z’. 


a*|Spec,(A)|Pz(A) < < |Spec,(A)|Pz(A) + a”. 


4.7 Progressions in sum sets 189 


4.6.6 Show that Theorem 4.41 implies Corollary 2.62. (Hint: use (4.14).) 

4.6.7 Let S bea subset of a finite additive group Z, and letO0 < p < 1/4. Show 
that if A is any additive set in Bohr(S, p), then S C Specgogqp)(A). This 
can be viewed as a kind of converse to Proposition 4.39. 


4.7 Progressions in sum sets 


A cornerstone of additive combinatorics is Szemerédi’s theorem. One form of this 
theorem states that if A is a subset of the interval [1, N] with positive density a, 
then A contains an arithmetic progression of length f(N, œ), where f tends to 
infinity as N does and a is fixed. In Chapters 10 and 11, we will discuss this result 
in more detail, but let us mention here that f tends to infinity very slowly as a 
function of N. 

In this section, we are going to show that if we replace the additive set A by a 
larger set, such as A+ B, A + A + A, or 2A — 2A, then one can locate signifi- 
cantly larger progressions inside these sets by taking advantage of the existence of 
functions supported on those sets with good Fourier transform, namely 14 * 1g, 
la * l4 * la and l4 * l4 * l_a * l_a. 

To illustrate this, we begin with a theorem of Chang (based on earlier work of 
Ruzsa [295]) which demonstrates the existence of a large generalized progression 
inside 2A — 2A; this theorem will be a key ingredient in one of the formulations 
of Freiman’s theorem (see Theorem 5.30). 


Theorem 4.42 (Chang’s theorem) /48] Let K, N > 1. Let A be an additive set 
ina cyclic group Z = Ly such that E(A, A) > |A|?/K. Then there exists a proper 
progression P C 2A — 2A of rank at most O(K(1 + log ma) and size 


( ( I —O(K (Hog pray) 
P| >O\|K{1+1o oan) N. (4.41) 
|P| 8 BoA) 


Furthermore we may choose P to be symmetric (—P = P). 


Note from (2.8) that the hypothesis E(A, A) > |A|?/K will be obeyed if 
|A + A| < K|A| or |A — A| < K|A|; thus this theorem covers the case of sets 
with small doubling constant or small Ruzsa diameter. Alternatively, from the 
trivial bound E(A, A) > |A|? we see this hypothesis is always satisfied with 
K = 1/P7z(A), but this is costly as the dependence of (4.41) on K is exponen- 
tial. On the other hand, if A has small doubling then this theorem can be applied 
efficiently even when A is a rather sparse subset of Z. 


190 4 Fourier-analytic methods 


Proof Set a := 1/2K'/?. By Proposition 4.39, we have 
1 
Bohr { Spec, (A), 5 C 2A —2A. 


On the other hand, from Lemma 4.36 we can find a set S$ := {n,..., na} of 
frequencies with 


1 1 
=i ofe (irern) ole Cen) 


such that 





Spec, (A) € [-1, 1]*- (m, .--, na). 


This implies (from the triangle inequality) that 


1 1 
Bohr | S, — } C Bohr | Spec, (A), = }. 
6d 6 


Applying Proposition 4.23 we see that Bohr(S, a) contains a proper symmetric 
progression of rank d and cardinality 


1 —O(K (1+log yw) 
nzo(x (1+185-55)) a 
Z 


In the proof of the above theorem (or more precisely, in the proof of 
Proposition 4.39) one took advantage of the fact that 14 x 14 * 1—4 * 1—4 had 
positive Fourier coefficients |14(&)|*. However, it turns out that with a slight mod- 





es 








|P| > 











and the claim follows. 





ification to the argument one does not need positivity of the Fourier coefficients, 
and in fact one only needs three summands instead of four: 


Theorem 4.43 [149] Let K,N > 1. Let Aj, A2, A3 be additive sets in Zn 
such that |A| = |A2| = |A3| and |A; + A2 + A3| < K|Aj|. Then there exists a 
proper progression P © A, + Az + A3 of rank at most O(K*(1 + log Paap) and 
size 








-0(K?(1+log prta5)) 
)) N. (4.42) 


One can of course generalize the hypotheses to deal with sets A1, A2, A3 of 


|P| =O (x (1+1 
© PAD 


differing cardinalities, but the statement of the theorem becomes a little messier 
and we do not pursue it here. 


4.7 Progressions in sum sets 191 


Proof We adapt some arguments of [117]. We consider the non-negative 
function f := 14, * l4, * 14,. From (4.10) we have Ez f = P2(A1). On the 
other hand, we have Pz(supp(f)) = Pz(Ai + A2 + A3) = KPz(A1). By the 
pigeonhole principle, we can thus find an element x9 E€ A; + A2 + A3 such that 
f (xo) > Pz(A1)?/K. By translating one of the Aj, if necessary, we may assume 
xo = 0, thus f(0) > Pz(A,)?/K. 

Next, we observe from (4.9) that f(€) = 1,,(€)1a,()14,(€). From (4.4), 
Cauchy—Schwarz, (4.16) and (4.24) we thus have for any x € Z 








If) — FOS S Th, ©), ©), EEE x) — 1) 
EEZ 
< ILOILO Ole «x)= 1| 
EEZ 


< (2x Ifa, Olle x) — u) WWasle@llia Olr 
= Pz(AD sup Ta Olle -x)— 1| 


< 27 Pz(A1) sup | [a lE + xllr/z- 
EEZ 
Combining this with our bound on f (0) and the support of f, we see that 


{x eZ: supl ©] lë- xlirz < Pz(41)/27K } S A1 + A2 + As. 
EEZ 


Since It, ©)| lé -< xllr;z < Pz(Ai)/27K whenever & ¢ Specyj.,«(Ai), we 
obtain 


freZ: sp [RONE lez < Pz(4D/21K} CA, + A2 + As. 
EeSpecy rq x (41) 


Moreover, as G < P2(Aı) for all non-zero £, we obtain 
Bohr(Spec 27x (41), 1/27 K) C Ay + A2 + 43 


(for instance). But by Lemma 4.36 we can find d = O(K7(1+ log Pap) and 
frequencies S := {n1,..., Na} C Z such that 


Spec jor (Ai) c [-1, 1]? i (m, ae) Na) 
and hence by the triangle inequality 


Bohr(S, 1/2xd K) C Bohr(Spec; y g(41), 1/20 K) © A1 + A2 + A3. 


192 4 Fourier-analytic methods 


Applying Proposition 4.23, we can locate a proper progression P in 
Bohr(S, 1/22dK_) of rank d and cardinality at least 


1 = i 
ip] = EEEE N > (CKA + log /Pz(A1)))) CR 0A y 











and the claim follows. 





The above arguments relied crucially on having three or more summands; 
roughly speaking, two of the summands were treated by Plancherel’s theorem, 
leaving at least one other summand to be free to exploit the smallness of its Fourier 
coefficients outside of its spectrum. They break down quite significantly for sums 
of two sets!. Nevertheless, it is still possible to obtain some relatively large pro- 
gressions in a set of the form A + B, because the function 14 * 1 still has /' type 
control on the Fourier coefficients. We follow the arguments of Bourgain [36]. We 
first give a convenient criterion for establishing the existence of progressions. 


Lemma 4.44 (Almost periodicity implies long progressions)/36/ Let f : Z > 
Rt be a non-negative random variable on an additive group Z, let J > 1 be an 
integer, and suppose thatr € Z is such that 


Ez max |T} f — f| < Ez f, 
SJS 
where TI" f(x) := f(x — jr) is the shift of f by jr. Then supp(f) contains an 
arithmetic progression a + [0, J] -r of length J + 1 and spacing r. 
Proof By the pigeonhole principle, there exists x € Z such that 


max |T? f(x) — f(x)| < f(x) 


I<jsJ 














and hence f(x — jr) = T" f(x) > 0 for all 0 < j < J. The claim follows. 


To apply this lemma, we need to estimate expressions of the form 
Ez max;<j<; ITF f — f|. This can be done easily if f has Fourier transform 
in a dissociated set: 


Lemma 4.45 [36] Let S C Z be a dissociated set, and let f be a random variable 
such that supp(f) © S. Then for any non-empty set of shifts H C Z we have 


| max |T? 
heH 





= O(1 + log |H DPI fleo. 
L?(Z) 


1 There is a similarity with the Goldbach conjectures. The weak conjecture — every large odd number 
is the sum of three primes — has been solved by Fourier methods, whereas the strong conjecture — 
every large even number is the sum of two primes — is still open, and probably not amenable to a 
purely Fourier-analytic method. 


4.7 Progressions in sum sets 193 


Proof Let p > 2 bea large exponent to be chosen later. Then 


| max 7" fi max |T" f| 


<| 
L?(Z) heH aoe 


A 


1 
= > Da 


heH 
<All? lf llc 
< HIP ISa l fla 
= o(IH? JPI fleo) 


by Rudin’s inequality (Lemma 4.33). The claim now follows by setting p := 
O(1 + log |H)). 




















L?(Z) 














By combining this lemma with Lemma 4.35, we can obtain an estimate when 
supp( Ê) is not dissociated, but f is uniform in size: 


Lemma 4.46 [36] Let f be a random variable, and let J,d > 1. Suppose that 
there exists an integer m such that 2” < | f (€)| < 2+! for all £ € supp(f). Then 
one can find a set S C Z of cardinality |S| = d such that such that 


a log J 
Ez max |T"f—fl=0} X Ife (= + Janay Inri) 


Eesupp(f) 





forallr € Z. 


Proof Applying Lemma 4.35, we may write 

supp(f) = D1 U--- UD, UR 
where D,,..., Dx are disjoint dissociated sets of cardinality d+ 1, and 
R C [-1,1]"-(m,..., na) for some S = {7,..., Na} C Z. Using the Fourier 


transform, we may then split f = fp, +--+ fp, + fr accordingly. From 
Lemma 4.45 we have, for any 1 < i < k, 


Ez max |T" fo, — fo,| < 2|| max IT” fo, 
1<jeJ O<jeJ 








L?(Z) 
< O(log"? JI fo, llzz) 


1/2 
= 0 | log? J (x er) 


EeD; 


<o( [7 a jo) 


EeED; 


194 4 Fourier-analytic methods 


thanks to the uniformity assumption 2” < | f(&)| < 2”+!. Also, we have from the 
triangle inequality, (4.24) and the hypothesis on R 








max ITI" fr — fri 




















max J IPO x ler + jr, £) = eE -x)| 
SisJ ECR 


l<je L\(Z) LIZ) 
< (Eire) ms max, le(ir, £) — 1 
EER 
<2nJd (x es max || - r |lr/z- 
EER 














Summing these estimates using the triangle inequality, the claim follows. 
Now we can prove Bourgain’s theorem. 


Theorem 4.47 [36] Let N > 1 beaprime number, and let A, B be additive sets in 
Zy such that |A|, |B| > êN for some cue < ô < 1 for some large absolute 
constant C > 1. Then A + B contains a proper arithmetic progression of length 
at least exp(Q2(6 log N)!3). 


Proof We may assume N to be large. By removing elements from A and B and 
increasing ô if necessary we may assume P7(A) = Pz(B) = ô. Set f := 14 * 1p, 
and let exp(Q2(6 log N)!/3) < J < N be chosen later: thus supp( f) = A + B and 
Ez f = Pz(A)Pz(B) = 57; note also that J >> 1/5. By Lemma 4.44, it suffices to 
show that 


Ez max |T? f — f| < 8? 
I<j<J 


for some non-zero r. 
The Fourier coefficients f of f cannot exceed f(0) = Ez f = 5. Furthermore 
we have by Cauchy—Schwarz and (4.16) 


YO) = Yo) Tse 

Ecz Łez 
< lIlego Walley (4.43) 
= Pz(A)"?Pz(B)'? 
=ô; 


To exploit this, we let M > 1 be chosen later and partition 


Z= U Fm U Terr 
0<m<M 


4.7 Progressions in sum sets 195 


where Ty := {E € Z: 218? < ÂE <28} and Serr = {EE Z: 
| f(E)| < 27™8?}. This induces a splitting 


f ae x Ím + ferr- 


0<m<10log } 
We can apply Lemma 4.46 to each fm, with d > 1 to be chosen later, to obtain 


; A log J 
jr = = g 
Ez quae le tm — Fm| = O ( X | f m(&)| (y 7 + Jd max lln rinz)) 


EET m 





where Sm is a set of frequencies of cardinality |S,,| = d; summing this in m and 
using (4.43) we obtain 


ir log J 
XO Ez max |T!" fin — fal = O | 8 (| — + Jd max |in- rllz 
0<m<M IsjeJ d nes 


where S := Uosnz m Sm is a set of frequencies of cardinality |S| < dM. As for 
ferr, we crudely use the triangle inequality: 





Ez max [ee forr = Ferr s 5 ITI forr In(z) 
l<jeJ 1<j<J 
£ > IT" ferr llez) 
I<j<J 
1/2 
=J ( do IF) ‘ 
ED err 
< JO ll Île 
50 es, 


Combining these estimates using the triangle inequality, we see that to conclude 
the theorem we need to find an r Æ O such that 


log J 
[2E + Jdmax|in: rllrz + S27! < cô 
d nes 


for some small absolute constant c > 0. If we choose M := C log J and d := 
C5~* log J for a sufficiently large C, then it is clear the first and third terms will 
be less than cô/3 (recalling that J >> 1/6), and so it will suffice to find an r 4 0 
such that 





cô JS 
— < 
3Jd  JlogJ 





max |in -rllr/z < 
neS 


196 4 Fourier-analytic methods 


where c’ > 0 is another small absolute constant. Using Lemma 4.20, we see that 
this is possible provided that 


-isi c8? |S] 
2 = N >l. 
JlogJ 


IS| < dM = O(5 * log’ J) 


But since 


we see that we can achieve this by setting J := exp(c”(5 log N)'/?) for a suitably 
small c”, using the lower bound hypothesis on ô. The claim follows. 














The length exp(QQ(6 log N 3) was recently extended to exp(Q2Q(6 log N )!⁄2) 
(and the condition on ô relaxed slightly to C wie < 6 by Green [149], by an 
interesting variational Fourier argument which we briefly sketch here. The starting 
point is Exercise 4.3.12, non-empty set E of some fixed density P7z(F) = £, to 
be chosen later, which is disjoint from A + B and minimizes the quantity || E'||, 
subject to the above constraints; Exercise 4.3.12 thus places a lower bound on this 
quantity ||£||,. One then considers the a-spectrum A,(E) of E, for some a to 
be chosen later, and uses Lemma 4.36 to place this spectrum inside a progression 
[—1, 1]?- (m1, .-.., ng) for some set S = {7,..., ng} of frequencies which is not 
too large. Next, one removes a small number of elements (chosen at random) from 
E and replace them by generic elements of Z; by Lemma 4.16 this shrinks the 
Fourier bias of E with high probability. Next, one takes these new generic elements 
of Z and translates them by a suitable element of Bohr(S, po) (for some suitably 
small p) to try to place all of them outside A + B. This operation, if successful, will 
not significantly affect the Fourier transform of E on the large spectrum A,(£) and 
should thus still shrink the Fourier bias of E. But this contradicts the construction 
of E. Thus it must not be possible to translate one of the generic elements outside 
of A + B, which means that A + B necessarily contains a translate of Bohr(S, p). 
From this and Proposition 4.23 one then establishes a large progression inside 
A + B. For more details (such as the selection of the parameters £, a, p), see [149]. 

On the other hand, an example of Ruzsa [290] shows that even when ô is close 
to 1/2, one can find sets A + A which do not contain any progressions of length 
exp(QQ(log NY), 

The arithmetic progressions inside iterated sum sets have been intensively stud- 
ied in [350]; we discuss this in detail in Chapter 12. 


Exercises 


4.7.1 [149] Let A1, A2, A3 C [1, N] be additive sets of integers such that 
|Ai| = |A2| = |A3| > 6N for some 0 < ô < 1/2. Show that A; + A; 


4.7 Progressions in sum sets 197 


contains a proper arithmetic progression of length at least 
exp(Q(6 log N)!/3 — O(log log N)), A; + A2 + A3 contains a proper 
arithmetic progression of length at least O(82) N®/!81/8) and that 
2A, — 2A , contains a proper arithmetic progression of length at least 
O(8°) y®/l081/8), (Hint: embed Aj, A2, A3 in Zp for some prime 
2N < p < 4N and apply the theorems of this section, followed by 
Exercise 3.2.5. One needs to pass from a progression in Zy back to one 
in Z; one tool for this is Corollary 3.25.) 

[349] Let P be a proper arithmetic progression in a torsion-free addi- 
tive group, and let A, B be an additive sets in P such that |A|, |B| > 
(1 — £)| P| for some 0 < £ < 1/4. Prove that A + B contains a proper 
arithmetic progression of length at least (2 — 4e)|P| — 1. (Hint: work 
with those elements of P + P which have at least 2¢| P| representations 
as sums of elements of P.) 


5 





Inverse sum set theorems 


In Chapter 2 we established the elementary theory of sum set estimates, showing 
how information on one sum A + B can be used to control other sums such as 
A — BornA — mA. These estimates worked reasonably well even when the dou- 
bling constants of the sets involved were fairly large, since all the bounds were 
polynomial in this constant. On the other hand, we did not get detailed structural 
information on sets with small doubling constant; the best we could do is cover 
them by an approximate group (Proposition 2.26). 

In this chapter we shall focus on the following question: given two additive sets 
A, B with A + B very small, what is the strongest structural statement one can 
then conclude about A and B? One of the main results in this area is Freiman’s 
theorem which (in the torsion-free case) asserts that an additive set A with small 
doubling constant o [A] = |2A|/|A| is contained in a progression of bounded rank 
which is not much larger than the original set. This theorem comes in a number 
of variants; we give several of them below. In doing so we shall also come across 
the useful concept of a Freiman homomorphism, which to a large extent frees the 
study of additive sets from the ambient group that they reside in, giving rise to a 
number of useful tricks, such as embedding the set inside a particularly nice group. 


5.1 Minimal size of sum sets and the e-transform 


Before we begin with inverse theorems, we first address an even more basic ques- 
tion: given the cardinalities |A|, |B| of two additive sets A and B in an ambient 
group Z, what is the least possible cardinality |A + B| of the sum set A + B? If 
we allow the group Z to be completely arbitrary, then the answer is given by (2.1) 
and Proposition 2.2, thus |A + B| > max(|A|, |B|), with equality if and only if 
one of the sets is contained inside a coset of a finite group G, and the other set is 
a finite union of cosets of G. However, for specific choices of Z, one can improve 


198 


5.1 Minimal size of sum sets and the e-transform 199 


this bound somewhat. For instance, if Z is the integers, then Z contains no finite 
subgroups other than the trivial one {0}, and so we expect to do better than (2.1) 
unless one of |A|, |B| is equal to 1. 

A very simple, but surprisingly powerful, tool for establishing the minimal size 
of sum sets is the e-transform, which we now define. 


Definition 5.1 (e-transform) [73] Let A, B be additive sets in an ambient group 
Z, and let e € A — B. We define the e-transform of the pair A, B to be the sets 
Ace) == AU (B + e) and Bie) = BN (A = e). 


One can view this transform as removing the elements of B\(A — e) from B 
and transferring them to A (after translating them by e). The main point of the 
e-transform is that it shrinks (or keeps constant) the size |A + B| of the sum set, 
while maintaining the total size |A| + |B| of A and B. More precisely: 


Lemma 5.2 [73] Let A, B be additive sets in an ambient group Z, lete € A — B, 
and let Ace), Bie) be the e-transform of A, B. Then A(e) and Bie) are also additive 
sets (i.e. finite and non-empty), and 


Awe + Bey CA+B. (5.1) 
Furthermore we have 
lAo l+ [Bel = |A] + |B], (5.2) 
and more generally 
[Ae NE|+|Bey NE|=|ANE|+|BNE| 
+ |(B\(A = e) N CE — e)\E)| (5.3) 
— |(B\(A — e)) N (E\(E — e))| 
for any E C Z. Finally, we have 
IA@l= IA [Bel < |B| (5.4) 
with equality in either expression if and only if B+ e C A. 


We leave the easy proof of this lemma to Exercise 5.1.2. We now give some 
applications of this Lemma. First we obtain the minimal size of sum sets in the inte- 
gers Z (cf. Lemma 3.18), taking advantage of the fact that the integers are ordered. 


Lemma 5.3 If A and B are additive sets in Z, then we have |A + B| > |A| + 
|B| —1. 


Proof Let e := max(A) — min(B); then we see that By) is the singleton set 
{min(B)}, and thus by (5.2) |A| = |A] + |B] — 1, so|Ace + Bol = |A| + |B] — 
1. The claim now follows from (5.1). 














200 5 Inverse sum set theorems 


Now we prove a similar result in a cyclic group Z, of prime order. Here the 
key fact to exploit is that Z, contains no non-trivial subgroups. 


Theorem 5.4 (Cauchy—Davenport inequality) [47], [68] If p is a prime, and 
A, B are two additive sets in Zp, then 


|A + B| > min(|A| + |B| — 1, p). 


This result was first discovered by Cauchy [47] and then rediscovered 122 years 
later by Davenport [68]. We remark that the corresponding result for restricted 
summation A+B := {a+ b:a¢€A,b € B, a + b} requires different methods to 
establish; see Section 9.2. We shall give alternative proofs of Theorem 5.4 in 
Section 9.2 and Section 9.8. 


Proof We induce on the size of | B|; thus we suppose that the claim has already 
been proven for all smaller sets B (the case |B| = 1 is trivial). Suppose we can 
find an element e € A — B such that the e-transform Be) of B is strictly smaller 
than B. Then we have |A(e)| + | Bie)| = min(|A¢e)| + | Beey| — 1, p) by the induction 
hypothesis, and the claim follows by (5.1) and (5.2). Thus we may assume that 
none of the e-transforms of B are strictly smaller than B. Using Lemma 5.2, this 
means that B + e C A forall e € A — B, so 


A—B+BCA. 


Using Proposition 2.2, we thus see that B is contained in a coset of a subgroup G 
of Z,, and A is a union of cosets of G. But since p is prime, the only subgroups 
G available are the trivial group {0} and the full group Zp. In either case the 
Cauchy—Davenport inequality is easily verified. 














One can generalize Lemma 5.3 and Theorem 5.4. Recall from Definition 2.32 
that the symmetry group Sym, (A) of an additive set A in an ambient group Z was 
defined as Sym; (A) := {h E€ Z : A +h = A}. 


Theorem 5.5 (Kneser’s theorem) /21]1] For any additive sets A, B in an additive 
group Z, we have 


|A + B| > |A + Sym,(A + B)| + |B + Sym; (A + B)| — [Sym,(A + B)| 
> |A| + |B| — |Sym,(A + B)]. 


Proof We use a triple induction. First we induce upward on |A + B|, thus assum- 
ing that the claim has been proven for all pairs A, B with a smaller value of |A + B|. 
Next, with |A + B| fixed, we induce downward on |A| + |B| (which is bounded 
above by 2|A + B|), assuming the claim proven for larger values of |A| + |B]. 
Finally, with |A + B| and |A| + |B| fixed, we induce upward on | B|, assuming the 
claim proven for smaller values of |B|. This rather complex induction is forced 


5.1 Minimal size of sum sets and the e-transform 201 


on us by the different reductions on A and B that we will use in the (surprisingly 
delicate) argument. 

Let G := Sym, (A + B). If G is not the trivial group {0}, then we can pass from 
Z to the quotient group Z/G, replacing A and B by (A + G)/G and (B + G)/G 
and reducing the size of |A + B|, and the claim then follows from the first induction 
hypothesis. Thus we may take Sym,(A + B) = {0}. Our task is then to show that 
|A + B| > |A| +|B|— 1. 

Suppose that By) = B for all e € A — B. Then we have A — B+B C Aas 
before, and so by Proposition 2.2, B is contained in a coset of a group H, and A 
is a union of cosets of H. Then Sym; (A + B) contains H and hence H = {0}, 
which implies |B| = 1. The claim is then easily verified. 

It remains to consider the case when Bi.) is strictly smaller than B for at least 
one e € A — B. Among all such e, we choose one which maximizes the value of 
| B,)|. By translating B (and B,.)) by e if necessary we may normalize e = 0; thus 
Aw = AU Band Bo = A N B. Note from (5.3) that |Aq@) + Bo| < |A + BI, that 
|A@| + |Bo| = |A| + |B|, and |B| < |B|. Thus by the induction hypotheses we 
have 


|A@ + Bol = |Aw + 4|+|Bo + H|- |A], (5.5) 


where H := Sym; (Aœ) + Boy). Let C := (AN B) + H. By definition of H and 
Aco), Bo), we see that A + C and B + C are contained in A@ + Bo) and hence 
in A+ B. So we can replace A and B by AUC and BUC without affecting 
A + B or Sym(A + B). Thus we may assume that C is contained in both A and B, 
otherwise |A + C| + |B + C| would exceed |A| + |B| and the claim will follow 
from the second induction hypothesis. In particular we see that AM B = C is the 
union of a non-zero number of cosets of H. 

Suppose that Aig) + Bo is equal to A + B; then H = Sym(A + B) = {0}, and 
the claim follows from (5.5) and (5.3). Thus we may assume that A(o) + Bo) is 
strictly smaller than A + B. 

Let A’ denote those elements a € A such that a+b ¢ Aœ + Bo for some 
b € B. By the previous assumption, A’ is non-empty; also observe that a (and 
hence a + H) is disjoint from C = Bo for alla € A’. Let b be such that a + b ¢ 
Ac) + Bo): then a+b + H is disjoint from Aw) + Bo) (by definition of H); 
since b € Aj), we conclude that a + H is disjoint from A N B. Also we have 
((a + H)/M A) + b disjoint from Aœ) + Bo) and contained in A + B; thus 


|A + B| = |A@ + Bol + |(a + H)N A]. 
Since A N B is disjoint from a + H, we have 
[Ao + H| > |A@| + (A0) + H\A@) (a + A)| 
= |Aol| + |H| — (a + H)N Al — |(a + H)N B| 


202 5 Inverse sum set theorems 


and hence by (5.5) and (5.3) 
|A + B| > |A| + |B] — |(a + H) N Bl. 


Thus we will be done unless we have |(a + H) N B| > 1 foralla € A’, which we 
now assume. 

For each a € A’, let Ag := (a + H) N A and B, := (a + H) N B. Suppose we 
can find a, a’ € A’ such that Aa — Ba + By Z Aq. Then we can find e € Aa — 
Ba C H such that By +e Z Ax. This shows that B is not contained in A — e, 
and thus Bie) is strictly smaller than B, and also contains both Bo) = C and the 
non-empty set Ba N (Aa — e) (which lies in a + H and is hence disjoint from C), 
and is thus strictly larger than Bio). This contradicts the maximality of | B|. Thus 
we must have A, — Ba + By C Aq for all a, a’ € A’. This implies in particular 
that |A,| = |Av | for all a, a’ € A’, which by Proposition 2.2 implies that the Ba 
are each contained in a coset of a fixed group K, and that the A, are unions of 
cosets of K (in particular K is a subgroup of H). Since we are assuming that 
|B,| > 1 for alla € A’, we have |K| > 1. Since A, + B is the union of cosets of 
K for each a, and Ao) + Boo) is a union of cosets of H, and hence K, we conclude 
that A + B is the union of cosets of K. But this contradicts the hypothesis that 
Sym,(A + B) = {0}, and we are done. 














As one application of Kneser’s theorem we give a complete classification of 
sets with very small doubling constant. 


Corollary 5.6 (Near-exact inverse sum set theorem) Let A be an additive set 
in an ambient group Z. Then the following are equivalent: 


e ofA] < 3 (ie. |A + A| < 3A); 

e S[A] < 3 (ie. |A — A| < 3/Al, or d(A, A) < log 3); 

e |A+B| < 3|A| for some additive set B in Z with |B| > |A]; 

e |nA—mA| < 3|A| for all non-negative integers n, m; 

e A C x +G for some x € Z and subgroup G of Z with |G| < 3JAl. 


This should be compared with Proposition 2.7 and Exercise 2.6.5. The factor 


3 is sharp, as can be seen by the example A = {0, 1} in the integers Z, or more 


2 
generally A = {0, 1} x G in the group Z x G for any finite group G. 


Proof We shall only prove that the third claim implies the fifth; the other claims 
are similar or trivial and are left as an exercise. From Kneser’s theorem we have 


3 
gal > |A + B| > |A| + [B] — [Sym,(4 + B)| > 2]A| — |Symı(A + B)|; 


hence if we set G := Sym, (A + B), then |G| > |A|/2. Since |A + B| < 3JA| and 
A + B is a union of cosets of its symmetry group G, we thus see that A + B is 


5.1 Minimal size of sum sets and the e-transform 203 


equal to the union of at most two cosets in G, and |G| < 3/Al. Suppose first that 
A + B is the union of two cosets of G. Then 3|B| > 3|A| > |A + B| = 2|GI, 
which implies that neither A nor B can be contained in a single coset of G. But 
this contradicts Kneser’s theorem again. Thus A + B is a single coset of G, which 
implies that A is also contained in a coset of G. The claim follows. 














Now we return to the integers, and obtain a more advanced version of 
Lemma 5.3. 


Theorem 5.7 (Mann’s theorem) [243] Let N > 0, let0 <a < 1, and let A, B 
be additive sets in Z such that 0 € A, B and 


JAN [1,n]|+|BO[1,n]| > an (5.6) 
forall0 <n < N. Then 
(A+ B)N[1,n2]| > an forall0<n<N. 


Proof The claim is easily verified for N = 0, so let us assume inductively that 
N > 1 and the claim has already been proven for all smaller N. In particular from 
this induction hypothesis we already have 


(A+ B)N[1,n]| > æn foralO<n<N 
and so it suffices to prove that 
(A+ B)O[1, N]| > aN. 


We now fix N and induce on |B|. If |B| = 1, then B = {0} and the claim is 
easily verified, so suppose that |B| > 1 and the claim has already been proven for 
all smaller values of B. Without loss of generality we may take A C [0, N] and 
B C [0, N] as the additional elements of A and B are clearly harmless. 

In light of Lemma 5.2 and the induction hypothesis, it will suffice to find an 
integer e € A C A — B such that | Bie] < |B| and 


|Aw N, n]| + [BeN NU, n]| = an for alll <n <N. (5.7) 


Note that the constraint e € A will ensure that both A(-) and Bie) contain 0. 

Suppose first that B is not contained in A. Then we can simply choose e = 0 € 
A, since By = A N B would then be strictly smaller than B, and from (5.3) and 
(5.6) we have 


[Ao A [1, a]| + [Bo N [1, n]| = [AA [1, a]| + |B A [1, n]| = an 


as desired. 
Now we consider the harder case when B is contained in A. Here we take 


e:=minfae¢A:a+B¢€ A}. 


204 5 Inverse sum set theorems 


Note the set on the right-hand side is non-empty since the largest element of A 
clearly belongs to this set. We have e € A; by hypothesis, e is positive, and by 
construction we have 


(AN [0,e))+ BCA. (5.8) 
Also by Lemma 5.2 Bie) is strictly smaller than B. Thus it remains to show (5.7). 
By (5.3) (and observing that B\(A — e) is disjoint from [—e + 1, 0]) we have 

[Aw N [L a]l + | Be AU, a] = ANAM, all + |B OU, 2] 
— |(B\(A — e)) N[n—e + 1,n)| 
2 |AN[,n]|+/BO [1,2 — ell. 

If BN [n — e+ 1, n] is empty then the claim (5.7) would now follow from (5.6), so 
we may assume B N [n — e + 1, n] is non-empty. Then if we let b be the minimal 
elementin BN [n — e + 1,n],thenb € B C A,andalsosincee € A C [0, N] we 


see that n — b < e — 1 < N. We can now continue the previous calculation using 
two applications of (5.8) and the induction hypothesis as 


[Aœ N [1 n]] + [Bey O01, a) 
>|AN[I,n]|+|BO[1,n -e]| 
=|AN[I,b-1]}+14+|AN[b4+1,n]|4+|BNO[1, b—- 1]| 

IAN [1,6—1]|+|BO[1,b—- 1] +14 |(AN[0, e)) + B)N[b + 1, n]| 

a(b—1)+14+|(AN[0, e)) +b) [b + 1, n)| 

ab+|AN[l,n—b]| 

ab + |(AN [0, e) + B)N[1,n — b]| 

ab+|(A+ B)N [1,n — b]| 

ab + a(n — b) 


IV IV IV IV IV 


IV 


= Qn 











as desired. 





For further discussion of Mann’s theorem and several variants, see [168]. 

The e-transform method also allows one to characterize when the above inequal- 
ities are sharp. We begin with an inverse theorem for Lemma 5.3. 
Proposition 5.8 Let A and B be additive sets in Z such that |A|, |B| > 2. Then 
|A + B| = |A| + |B| — 1 if and only if A, B are arithmetic progressions of the 
same step. 
Proof The “if” part is clear, so we prove the “only if” part. Let e := max(A) — 
min( B). From the proof of Lemma 5.3 we see that we must have 


A+B = Aie, + Bie = (AU (B + e)) + min(B) = (A + min(B)) U (B + max(A)). 


5.1 Minimal size of sum sets and the e-transform 205 


Now let min(B) + v be the second smallest element of B, after min(B); then v > 0 
and for any a € A\{max(A)} we have 
a+min(B)+vCA+B=(A+min(B)) U (B + max(A)) 
= (A + min(B)) U (B\{min(B)} + max(A)). 
Note that since a < max(A) and min(B) + v is the minimal value of B\{min(B)}, 
than a + min(B) + v cannot lie in (B\{min(B)} + max(A)). We conclude that 
a + v € A for alla ¢ max(A). 
From this it is easy to see that A is an arithmetic progression of step v. In particular 


max(A) — v is the second largest value of A after max(A), and by adapting the 
previous argument we see that B is also an arithmetic progression of step v, and 











we are done. 





Now we give an inverse theorem for the Cauchy—Davenport inequality. 


Theorem 5.9 (Vosper’s theorem) [375] Let p be a prime, and let A, B be addi- 
tive sets in Z, such that |A|, |B| > 2 and |A + B| < p—2. Then |A + B| = 
|A| + |B| — 1 ifand only if A and B are arithmetic progressions with the same step. 


A similar theorem has recently been proven [174] in the case when |A + B| = 
|A| + |B|. A version of Vosper’s theorem exists for arbitrary groups Z but is more 
complicated to state; see [201], [231]. See also Exercise 5.1.11. 


Proof The “if” part is easy, so we prove the “only if” part. We first prove this 
claim when A is an arithmetic progression {a, a + v, ..., a + nv} for somen > 1. 
Then by Cauchy—Davenport 


|B] +n =|A|+ |B] —-1 
=|A+B| 
= |{a,at+v,...,a+(n— 1)v} + {0, v} + B| 
> |B +{0, v}] +n —1, 


and hence (by Cauchy—Davenport again) we have |B + {0, v}| = |B| + 1. Thus B 
and B + v only differ by at most one element, which implies that B is a progression 
of length v (see Exercise 3.2.7). By symmetry we have the same claim when the 
roles of A and B are reversed. 

Now we use a duality trick to claim the following variant: if the sum set A + B is 
a proper arithmetic progression, and |A + B| = |A| + |B| — 1, then sois A and B, 
and all three progressions have the same step. To see this, set C := —(Z,\(A + B)). 
Then C is also an arithmetic progression with the same step as A + B and with 
cardinality |C| = p — |A + B| = p + 1 — |A| — |B| > 2. Observe also that C + 
B C —(Z,\A), because if any element —a of —A was contained in C + B, then 


206 5 Inverse sum set theorems 


C would intersect —a — B C —(A + B), a contradiction. Thus |C + B| < p — 
|A| = |C| + |B| — 1, and hence by Cauchy—Davenport |C + B| = |C| + |B| — 1. 
Since C was an arithmetic progression of length at least 2, we see from the previous 
discussion that B is also, and has the same step as C. Similarly for A. 

To summarize, we have now proven Vosper’s theorem in the cases when at least 
one of A, B, or A + B is an arithmetic progression. Now we handle the general 
case. We induce on the size of B. If |B| = 2 then B is an arithmetic progression 
already, and the claim has already been proved. Now suppose that |B| > 2 and 
the claim has already been proven for smaller B. Suppose first that we can find 
an e € A — B such that the e-transform Be) of B has size 1 < |Bieo| < |B|. Since 
|A + B| = |A| + |B| — 1; by hypothesis, we see from (5.1), (5.2) and the Cauchy— 
Davenport inequality that we must have A(.) + By) = A + B and 


|A + Bol = |Aw| + Bol — 1. 


Using the induction hypothesis, we thus see that A(-) and Bie) are arithmetic pro- 
gressions with the same step v, and hence A + B = Aʻe) + Bie) is also an arithmetic 
progression, and the claim follows by the preceding discussion. 

The only remaining case is if we have |Bi.)| = 1 or |Bie)| = |B] for all e € 
A — B. But if E C A — B denotes all the e € A — B such that | Bie | = |B|, then 
by Lemma 5.2 we have B + E C A, and hence |E| < |A| — |B| + 1 by Cauchy- 
Davenport. Since |A — B| > |A| + |B| — 1 by Cauchy—Davenport, we thus see 
that | By-)| = 1 for at least 2| B| — 2 values of e. Since Be) is a singleton subset of 
B, we thus see from the pigeonhole principle that there exists e, e’ € A — B and 
b € Bsuch that Bye) = Ben = {b}. Since |A + B| = |A| + |B| — 1 by hypothesis, 
we see from (5.1), (5.2) that 


A+B=Ae+b=Aceyt+b 
and hence 
AU(Bt+e)=AU(B+2’). 


Since A intersects B + e only in b + e, and A intersects B + e’ only in b + e’, we 
thus see that B + e and B + e’ differ by at most one element. But this forces B to 
be a progression (of step e’ — e), and the claim follows. 














We now develop an inverse theorem for sets A, B of integers with fairly small 
sum set. We need a preliminary lemma. 


Lemma 5.10 Let A be an additive set in Z such that 0 € A, let N > 1 be an 
integer, and let by : Z —> Zy be the canonical quotient map. For eachx € oy(A), 
let ux := |{a € A: by(a) = x}| denote the multiplicity of dy at x, and denote 
m = MiNyegy(A)\{0} Hx- Then 


|2A| > |A| + |On(A)|(Ho — 2m) + |2n(A)|2m — 1) 


5.1 Minimal size of sum sets and the e-transform 207 


Proof We split (using Lemma 5.3 and the observation >>... dy (Ay Hx = IAD 


>X [2AN by! dx] 


xen (2A) 


2A| 


IV 


sup — |(AN @y'(Ly})) + (AN My ({z}))| 


xegy (2A) YZE$N(A):y+z=x 


sup (|AN dy'Cy)| + [AN by Czb| — 1) 


xedy (2A) YZEPN(A):y+z=x 


IV 





( > sup Uy + na) — |dn(2A)| 


xedy (2A) YZE$n (A): y+z=x 


> ( a pom) + ( 5y m+n) — |p (2A)| 
xegn(A) xed (2A)\bn (A) 


= Hob (A) + |A| + (lġn(2A)| — |on(A))2m — |on(2A)| 


as desired (noting that 26) (A) = @y(2A)). 














Now we give the inverse theorem. 


Theorem 5.11 (3k — 3 theorem) [1/16] Let A be an additive set in Z such 
that |2A| < 3|A| — 3. Then there exists a proper arithmetic progression P = 
a + [0, |2A| — |A|] - v of length |2A| — |A| + 1 that contains A. 


Proof We use an argument from [233]. By translating A we may assume that 
min(A) = 0. We may also assume that the set A has no common divisor d > 1, 
since otherwise we could replace A by 1 - A. We will assume that |A| > 3 as the 
cases |A| = 1, 2 can be verified directly. 

Write N := max(A), thus A C [0, N] and O, N € A. It will suffice to show that 
N < |2A| — |A|. Suppose for contradiction that N > |2A| — |A|. We now apply 
Lemma 5.10. Observe in this case that uo = 2 and m = 1, and hence 


|2A| > |A| + [2ġn(A)l. (5.9) 
Since we are assuming N > |2A| — |A|, we conclude that 
[2pn(A)] < N. (5.10) 
By Exercise 2.1.6 and the hypothesis |2A| < 3|A| — 3 we have 
[2py(A)| < 2|A] — 3 = 2|by(A)| — 1. 


If N were prime then we could apply the Cauchy—Davenport inequality to conclude 
the desired contradiction. But in general we must rely instead on Kneser’s theorem. 
Let H := Sym,(2¢y(A)), then by Kneser’s theorem we have 


[26 (A)| > 2lġn(A) + H| — |H| 


208 5 Inverse sum set theorems 


and hence if we set k := |@y(A) + H| — |ġn (A)|, then 


opel? 
SS ere (5.11) 


In particular |H| > 2. Also from (5.10) we have |H| < N. Since H is a subgroup 
of Zy, we see that H = (h- Z)/(N - Z) for some 2 < h < N which is a factor 
of N. 

Note that @y(A) contains zero, but cannot be contained entirely inside H as 





this would mean that A has a common divisor of h, contradicting our hypothesis. 
So we know that y(A) contains at least two cosets of H, or equivalently that 
Ipn(A)| = 2. 

Now we apply Lemma 5.10 again, but with N replaced by h. From (5.11) we 
see that if x + H C Zy is any non-trivial coset of H, then H U (x + H) intersects 
dy (A) in at least 2|H|—k points; since ¢y(0) = ġn(N)=0 € HU(*4+ A), 
this implies that by (H U(«x+ H) = p; ({0, x mod h}) intersects A in at least 
2|H|—k + 1 points. In other words we have 


Mo tm = 2|H|-k+1. 
A similar argument gives 
m>|H|—k. 
But since H was the symmetry group of 2@y(A), we see that 2¢,(A) has trivial 
symmetry group; furthermore from (5.10) we see that |2@,(A)| < h. Thus by 


Kneser’s theorem we have |2¢,(A)| > 2|ġ (A)| — 1. Inserting all these facts into 
Lemma 5.10, we obtain 


|2A| > [A] + |bn(A)|(Ho0 — 2m) + 2l, (A)| — Dm — 1) 
> |A| + IPA] H| — k — 3m + 1) + (2|Gn(A)| — 1)(2m — 1) 
= |A| + l@n(A)I2|H| — k — 1) + (l (A)| — 2)m + 1 
= |A| + IACH] — k — 1) + (1b: (A)] — 2) — k) +1 
= |A] + 319 (A)| H| — 2klġ, (A)| — l, (A)| — 2|H| + 2k + 1 
> [A| + 31 @n(A) A] — (H| — DlA) — lp, (A)| — 2/H| +2k + 1 
= |A| + 21a (A)| H| + [ġa(A)| — 2|/H| +2k + 1 
= |A| + 2A] + k) + |bn(A)| — 2|H| +2k + 1 














= 3|A| + |¢,(A)| — 2|H| + 4k — 1 


|H|—2 
SAFARA A 


> 3/A| —3 





which contradicts the hypothesis 2|A| < 3|A| — 3. 











5.1 Minimal size of sum sets and the e-transform 209 


Note that we have used a result on torsion groups to imply a result in the 
torsion-free case; this phenomenon will also come up in later proofs of Freiman’s 
theorem. The original proof of Freiman was somewhat different; see [116], [257]. 
A treatment of the case |2A| = 3|A| — 3 appears in [113], [28]. For some partial 
progress in the case |2A| = 3|A| + o(|A]), see [193]. There has also been much 
work on generalizing the 3k — 3 theorem to pairs of sets [111], [336], [333], [233]. 
For instance one has the following result. 


Theorem 5.12 [233] Let A, B be additive sets in Z such that |A + B| < |A| + 
|B| + min(|A|, |B|) — 3. Then A is contained in an arithmetic progression of 
length at most |A + B| — |B| + 1 and B is contained in an arithmetic progres- 
sion of length at most |A + B| — |A| + 1, where both progressions have the same 
difference. 


For some further refinements to this theorem, see [233]. 


Exercises 


5.1.1 Prove the remaining claims in Corollary 5.6. 

5.1.2 Prove Lemma 5.1. 

5.1.3 Show that Kneser’s theorem implies Lemma 5.3 and the Cauchy- 
Davenport inequality. 

5.1.4 [211] Let A, B be additive sets in an ambient group. Show that if |A + 
B| < |A| + |B| then 


|[A+B| = |A+Sym,(A+ B)|+|B+Sym,(A + B)| — |Sym,(A + B)]. 


5.1.5 [244]Let A, B be additive sets in an ambient group Z such that |A + B| < 
|A| + |B| — 1. Show that |(A + Sym; (A + B))\A| < |Sym,(A + B)| — 
1; thus A is rather close to being a union of cosets of Sym; (A + B). 

5.1.6 [243] If A is a (possibly infinite) set of integers, define the Schnirelmann 
density o (A) of A to be the quantity 


o(A) := inf Expy € A) = inf IAR RAM 
noo “lh N>0 |[1, NII 

(Note that this is distinct from the lower density ø (A) defined in Definition 
1.21, due to the use of the inf rather than the lim inf.) Show that if A and 
B are any sets of integers with 0 € A, B, theno(A + B) > min(o(A)+ 
o(B), 1). (Hint: use Theorem 5.7.) Conclude that if 0 € A and o(A) > 
1/k for some integer k > 0, then kA C Z+. Thus every set of integers 
of positive Schirelmann density that contains 0 is a basis for the positive 
integers. 


210 


5.1.7 


5.1.10 


5.1.11 


5.1.12 


5.1.13 


5 Inverse sum set theorems 


[312] Let A, B be sets of integers such that 1 € A and O € B. Show that 
o(A + B) > o(A)+0o(B)— o(A)o(B), where o() is the Schnirelmann 
density from Exercise 5.1.6. (Hint: order the positive elements of A 
as ad, < a <---, and observe that |(A + B)N [an, dn41)| => 1+ |BN 
[1, dn41 — an — III.) 

[311], [201], [202] Let A and B be additive sets in an ambient group Z. 
Prove that 


|A + B| > |A|+|B] — min |{(a,b)€ A+ B:at+b=c}l. 
ceAt+B 


(This can be done either by Kneser’s theorem, or more directly via the 
e-transform method.) 

Let p be a prime, let N > 1, and let Aj,..., Aw be additive sets in Z, 
such that |A;|+---+|Aw| = p +N — 1. Use the Cauchy—Davenport 
inequality to show that A; +----+ Ay = p. Conversely, show that this 
statement can be used to imply the Cauchy—Davenport inequality. 

What happens if one extends Theorem 5.9 to cover the cases |A| = 1, 
|B| = 1, or |A + B| = p — 1? (The case |A + B| = p is much more 
difficult to analyze and does not have as simple a characterization.) 
[201] Let A, B be additive sets in ambient group Z such that |A|, |B| > 1, 
|Sym,(A + B)| = 1, and |A + B| < |A| + |B|. By analyzing the proof 
of Kneser’s theorem (and Vosper’s theorem) carefully, show that A + B is 
either equal to an arithmetic progression, or there exists a finite subgroup 
G of Z such that A + B consists of one or more cosets of G, and possibly a 
subset of one other coset of G. (Compare with Exercise 5.1.5 and Exercise 
3.2.7.) 

[242] Let A, B be open subsets of the torus (R/Z)¢ . Prove the Mann— 
Kneser—Macbeath inequality mes(A + B) > min(mes(A) + mes(B), 1), 
where mes() denotes the usual Haar measure on the torus. (Hint: discretize 
the torus to (Z/ pZ)! for some large prime p, apply Kneser’s theorem, 
and then take limits.) Give examples to show that this inequality cannot 
be improved. One can extend this result to arbitrary measurable subsets 
of the torus with some additional analytic arguments. See [27] for some 
recent developments concerning this inequality. This inequality should 
be contrasted with the Brunn—Minkowski inequality (Theorem 3.16), 
and shows that sum sets in (R/Z)¢ and sum sets in R? behave slightly 
differently. 

[116] Let N > 0 be an integer, and let A, B be non-empty subsets of 
[0, N] such that 0, N € A and |A| + |B| > N +3. Prove that |A + B| > 
|B| +N. 


5.2 Sum sets in vector spaces 211 


5.1.14 Show that Theorem 5.11 fails when |2A| = 3|A| — 3, by considering a 
progression of rank 2. Also show that the quantity 2|A| — |A| in that 
theorem cannot be replaced by any smaller quantity. 

5.1.15 Let A, B be additive sets of integers. If AFB :={a+b:a,beA,aF 
b} denotes the restricted sum set of A and B, show that |A+B| > |A| + 
|B| — 3. (Hint: a direct application of the e-transform will not work, but 
if one deconstructs the proof of Lemma 5.3 one can modify it to deal 
with restricted sum sets.) If |A| 4 |B|, improve the preceding bound 
to |A+B| > |A| + |B| — 2. (Hint: one needs to adapt some ideas from 
Proposition 5.8.) An analogous result for Z, is known, but requires more 
non-elementary methods; see Section 9.2. 


5.2 Sum sets in vector spaces 


We now study the minimal size of sum sets in a real finite-dimensional vector space 
V, exploiting such concepts as convexity which are not readily available in other 
groups. Of course, since V contains a copy of Z, we know from Lemma 5.3 that 
|A + B| can be as small as |A| + |B| — 1. However, one can do better than this if 
one knows that A + B is high-dimensional, or in other words that it is not contained 
in a low-dimensional affine vector space (a translate of a linear vector space). 

We begin with the case A = B, which is somewhat easier. Define the rank 
rank(A) of a subset of V to be the smallest d such that A is contained in an affine 
space of dimension d. 


Lemma 5.13 (Frieman’s lemma) [1/6] Let A be an additive set in a finite- 
dimensional vector space V, and let suppose that rank(A) > d for some d > 1. 
Then we have 

d(d +1) 

=a 

Proof We induce on d. If d = 1 then the claim follows from Theorem 5.5, so 
let us assume d > 2 and that the claim is already proven for d — 1. Now we fix d 
and induce on |A|. The claim is vacuously true if say |A| = 1, so assume |A| > 2 


|A +A] > (d + DJA| — 


and that the claim is already been proven for smaller sets A. Let a € A be any 
extreme point of A; thus a is a vertex on the convex hull of A. Let A’ := A — {a}. 
We divide into two cases. If rank(A’) > d, then by induction hypothesis 

d(d +1) 

=a 

Since a lies outside of the convex hull of A’ and rank(A’) > d, there must exist (by 
the greedy algorithm) at least d extreme points x1, . . . , xq of A’ which are visible 


|A + A'| > (d + DA) = 


212 5 Inverse sum set theorems 


from a in the sense that the line segments joining a to x1, . .. , Xq lie outside the 
convex hull of A’. In particular we see that the d + 1 points a, =, ... , “$= lie 
outside the convex hull of A’, and in particular outside of 5 -(A’ + A’). Dilating 


this by 2 we see that a +a, a + x1, ...,a + xa are disjoint from A’ + A’. Thus 


d(d +1) 
2 





|A+A|>d+14+ [A+ A> @+DIAl—- 


thus closing the induction. 

It remains to consider the case when rank(A’) < d, thus A is contained in 
a d — |-dimensional affine space W. Since rank(A) > d, we have a g W. This 
means that 2a, a + W, and 2W are all disjoint; thus a + a, a + A’, and A’ + A’ 
are all disjoint; thus 


|A+A|>1+]A]—14+]A'+ A’. 


But since rank(A) > d, we have rank(A’) = rank(A\{a}) > d — 1, and hence by 
induction 


d(d — 1 d(d+1 
w +a zaa- E = aja- 











and the claim again follows by induction. 





Now we consider the problem of sums of two sets A, B in V. To make this 
problem more precise, let us temporarily define the quantity S(d, n, t) for any 
n > 1, t > 0, and d > 0, to be the least value of |A + B|, where A, B ranges 
over all additive sets in a finite-dimensional vector space V, such that |A| > n, 
|B| > n — t,andrank(A + B) > d.Since|A + B| > |A| we have the trivial bound 


S(d,n,t)>n. (5.12) 


This bound is however not sharp in general, and we shall improve it presently. We 
first need a lemma analyzing the behavior of A + B near an extreme point of A 
and B, similar to that used in the proof of Lemma 5.13. 


Lemma 5.14 [296] Let A, B be additive sets in a finite-dimensional vector space 
V such that A and B both contain 0, and suppose that 0 is a vertex on the convex hull 
of A U B. Let A’ := A — {0} and B’ := B — {0}, and C := (A! U B’)\\(A’ + B’). 
Then A + B lies in the subspace of V spanned by C. 


Proof Without loss of generality we may take V = R”. By the Hahn—Banach 
theorem, there exists a linear functional @ : V —> R such that (x) > 0 for all 
x € (AU B)\0. We need to show that every element x of A + B lies in the span 
of C. We shall prove this by induction on ¢(x), which is a non-negative integer. If 
(x) = 0, then x = 0 and there is nothing to prove. Now suppose that (x) > 0 and 
the claim has already been shown for all smaller values of f(x). Ifx € A’ + B’, then 


5.2 Sum sets in vector spaces 213 


we can write x = a + b where a € A’ and b € B’. But since $(x) = (a) + (b) 
and (a), (b) > 0, we see that o(a), ọ(b) are strictly less than (x), and the 
claim follows from induction. The only remaining case is when (x) > 0 and 
x ¢(A’ + B’). But since x € A + B, this implies that x € C, and we are done. 














We can now obtain the following recursive inequality on S(d, n, t). 
Proposition 5.15 /296] Letd > 1, n > 2, andt < n — 2. Then 


S(d, n, t) > min(S(d,n — 1,t)+d + 1, S(d—1,n—1,t)+n, 
S(d—-—1,n—1,t-—1)+n-—-t). 





Proof Let A, B be as in the definition of S(d, n, t), note that A and B contain 
at least two elements. Since A and B are finite, we can find a linear functional 
$ : V — R which is injective on A U B (indeed one could select ¢ randomly). 
Since ¢ is injective, we see that there is a unique element aọ E A which minimizes 
¢ on A, i.e. O(a) > (ao) for all a € A\ao. Similarly we can find a bọ € B which 
minimizes @ on B, so that ¢(b) > ¢(bo) for all b € B\bo. By translating A and 
B if necessary we may assume aọ = bo = 0. Thus A and B now both contain 0, 
and if we define A’ := A\{0} and B’ := B\{0}, then ¢ is strictly positive on both 
A’ and B’. In particular ¢ is strictly positive on A’ + B’, which therefore does not 
contain 0. 
From Lemma 5.14 we have 


A'U B\(A' + BY > d 
and hence (since 0 is contained in A + B but not A’, B’, or A’ + B’) 
|A+ B|>|A'+ B’|4+d+1. 


Let c+ W denote the affine span of A’ + B’, where c € V and W is a linear 
subspace of W. If we knew that rank(A’ + B’) = dim(W) > d, we could then 
conclude that |A’ + B’| > S(d,n — 1, t), and we would be done. Thus we may 
assume that dim(W) < d — 1. Thus if we pick a; € A’ and b; € B’ arbitrarily, 
then we have A’ € a; + W and B’ € bı + W. Thus A+ B is contained in the 
span of W, aj, and bı. By hypothesis, this means that at least one of a;, bı must 
lie outside of W. 

We now divide into a number of cases depending of the relative position of a; 
and bı with respect to W. Suppose first that a, and b; are linearly independent 
modulo W. Then A = 0 U A’ lies in {0, a,} + W, and is thus disjoint from A + B’, 
which lies in {b1, a; + bı} + W; so 


[A+ B| > |A + B’|+|A| > |A + B| +n. 


214 5 Inverse sum set theorems 


On the other hand, rank(A + B’) > rank(A + B)— 1 > d — 1, which implies 
|A + B’| > S(d —1,n — 1, t). The claim thus follows in this case. 

Now suppose that a;, b; are linearly dependent modulo W and b; g W. Then 
A’ ca; + Wand A’ + BY Ca, +b, + W are disjoint, while 0 is disjoint from A’ 
(by definition) and A’ + B’ (by previous remarks). Thus 


|A+ Bl) >1+|A)+|A'+ BY) >nt+ A+ B’. 


On the other hand, since A + B is contained in the span of W and bı, we have 
rank(A’ + B’) = dim(W) > rank(A + B) — 1 > d — 1,hence |A’ + B’| > S(d — 
1,n — 1, t). The claim again follows. 

The only remaining case is when b; € W, which forces a, g W by previous 
discussion. Then A’ + B and B are disjoint, thus 


|A+ B)> |B) +|A'+ Bl >n—t+|A'+BI. 


But since rank(A’ + B) > rank(A + B)—1>d-—1,wehave|A’+ B| > S(d — 
1,n — 1,t — 1), and the claim again follows. 














Corollary 5.16 /296] For anyn > 1, t > 0, d > 0 we have 


S(d,n,t)> > r- J min(s, a). 
n—d<r<n l<s<t 
Proof The cases d = 0, n = 1, orr > n — 1 can be easily verified from (5.12), 
so we may restrict ourselves to the case d > 1, n > 2, and t < n — 2. We shall 
induce on the positive quantity n + d + t, assuming inductively that the claim has 
already been proven for all smaller values of n + d + t. But then we have 


S(d,n—1,t)+d+1> 5 r— 5 min(s,d)+d +1 





n—d—1<r<n-1 l<s<t 

= r— )° min(s, d) 
n—d<r<n l<s<t 

Sd-1,n-1,t)+n> J. rt+n— J min(s,d—1) 

n—d<r<n—-1 l<s<t 

> 5. r— ye min(s, d) 
n—d<r<n l<s<t 

S(d—1,n—1,t—1)+n-t> > r— yi min(s,d)+n—t 

n—d<r<n-1 1l<s<t—1 

> yA r— 5 min(s, d) 
n—d<r<n l<s<t 











and the claim follows from Proposition 5.15. 





5.2 Sum sets in vector spaces 215 


This inequality is sharp in many cases, although there have been some 
refinements using techniques relating to the Brunn—Minkowski inequality 
(Theorem 3.16); see [128], [129]. As a consequence of the inequality we obtain 
the following generalization of Theorem 5.13: 


Theorem 5.17 [296] Let V a finite-dimensional vector space and d > 0, and 
let A, B be additive sets in V such that rank(A + B) > d, then have |A + B| > 
[Al + |B] — SS. 


Proof Apply Corollary 5.16 with n := |A], t := |A| — |B| and use the trivial 
bound }°)<,<, min(s, d) > t to obtain 


_dd+b 


A+ BI> d+) (n-$)-1=ntaen—9 5 











as desired. 





We now return to additive sets in a vector space with small doubling. Define a 
d-parallelepiped P in a vector space V to be any set of the form 


P=a+I- v+ +I- vg 


where v1,..., vq are vectors in V (not necessarily linearly independent), a € 
V, and T = {x e R: —1 < x < 1} is the closed unit interval. The 2¢ points a + 
{—1, 1} -vi +--+ {—1, 1} - va (which may possibly have multiplicity) are called 
the corners of this d-parallelepiped, while a is the center; note that the corners form 
a progression of rank d and dimensions (2, . . . , 2), which may or may not be proper. 
A remarkable fact, known as the Freiman cube lemma, is that if an additive set A in 
a d-dimensional vector space has small doubling, then there is a d-parallelepiped 
which contains a large fraction of A and whose corners lie in the set A. This is 
certainly not true for general sets A, as can be seen for instance by considering the 
set {(n, n?) : -N <n < N} in Z? CR’. To prove the Freiman cube lemma we 
first prove an auxiliary lemma which is useful for inductive purposes: 


Lemma 5.18 /28] Let V be an d-dimensional vector space, and let W bead — r- 
dimensional linear subspace of V for some 0 <r < d. Let A be a symmetric 
additive setin V (thus —A = A)andlet K = o [A] = |A + A|/|A| be the doubling 
constant. Then there exists a r-paralleopiped P with corners in A and center 0 
such that 


IAN (P + W)| > (OK)? * Al. 


Proof We induce on the codimension r. First suppose that r = 1. Without loss 
of generality we may take V to be a Euclidean space R. We let vı be an element 


216 5 Inverse sum set theorems 


of A which maximizes the quantity dist(v;, W); then it is easily seen that the 1- 
parallelepiped P = 0 + T - vı will obey the desired properties (here we exploit the 
symmetry of A to place both corners of P in A). 

Now suppose that r > 2 and that the claim has already been proven for all 
smaller values of r. We place W inside a d — 1-dimensional hyperplane H C V, 
which divides V into the hyperplane H and into two open half-spaces H_ and H4. 
By the pigeonhole principle, one of the three sets A N H, AN H, and AN Ay 
has cardinality at least |A|/3. 

Suppose first that |A N H| > |A|/3. Then by applying the induction hypoth- 
esis (with V replaced by H and d replaced by d — 1) we can find an r — 1- 
parallelepiped P C H C V with corners in AM H C H and center 0 such 
that 


IAN (P+ W)| > (AN A)A(P+W)| > OK)? FH YAI/3 
> (9K) HA]. 


The claim then follows by adding a dummy vector v, = 0 to P to make it a 
r-parallelepiped. 

Without loss of generality, it remains to consider the case when |A N H| > 
|A|/3. Since |2(A N H,)| < |2A| < K|A|, we conclude that o [A N Hy] < 3K. 
By Exercise 2.3.14, some origin a = x/2 (since F = x — F) with |F| > |A|/9K 
and o [F] < 9K?. Since F is contained entirely in the half-space H4, we see that 
a € H, also. In particular, a ¢ W. Now let W’ be the d — r + 1-dimensional linear 
space spanned by W and a, and apply the induction hypothesis with A replaced by 
F — a, K replaced by 9K”, W replaced by W’ andr replaced by r — 1. This allows 
us to findar — 1-parallelepiped P’ = a+J7-v;+---+T7- v,—; with center a and 
corners in F such that 


|F A (P' + W)| > (81K 7 HF] > OK) HA, 


Now we let P be the r-parallelepiped T - a + T - vi +-+- + T- v,—1; since F and 
—F are both contained in A (by the symmetry of A) we see that the corners of P 
lie in A, and P is certainly centered at the origin. To conclude the proof we need 
to show that 


[AN (P + W)| > |F A (P+ W^]. 
To prove this, we use a sliding argument taking advantage of the symmetries of 


A and F. Let us split W’ = W.9 U Weo, where Wo is the open half-space in W’ 
with boundary W which contains a, and W<o is the closed half-space in W’ with 


5.2 Sum sets in vector spaces 217 


boundary W which excludes a. Then 


|F A (P+ W)| = |F A (P' + Wo) + [F N (P+ Wao)! 
= |(F — 2a) N (P' + Wso — 2a)| + |F N (P' + Weo)| 
= |F) N (P' + Wso — 2a)| + |F N (P' + Weo)| 
= |[(—F)N (P + W>o — 2a)] U [F N (P’ + Weo)Il 


since F is symmetric around a, and F and — F are disjoint (one lies in H} and 
the other lies in H_). It thus suffices to show that the sets — F N (P’ + Wao — 2a) 
and F N (P’ + Weo) lie in AN (P + W). That these sets lie in A is clear, since 
A contains both F and — F. Also observe that (P’ + W.9 — 2a) is contained in 
—(P' + Wo) since P’ is symmetric around a’. Thus it only remains to show that 
FO (P'+ W<) C P+ W. But since F = 2a — F lies in H,, and the corners 
of P’ lie in F, and W lies in H, we see that both F and P’ + W lie in the slab 
between H and 2a + H. Thus F N (P’ + Wao) lies in the set P’ — {ta : 0 < t < 
2}+ W = P + W, and the claim follows. 














As a corollary we obtain 


Corollary 5.19 (Freiman cube lemma) Let A be an additive set in a d- 
dimensional vector space V, and let K = o[A] be the doubling constant. Then 
there exists a d-parallelepiped with corners in A such that |A N P| > BK” |A|. 


Proof [28] Applying Exercise 2.3.14 we have o[F] < K?. Now apply 
Lemma 5.18 with W = {0} andr = d. 














Lemma 5.13 shows, roughly speaking, that if A is an additive set in a vector 
space then rank(A) is controlled by a linear function of the doubling constant o [A]. 
The following remarkable theorem shows that if one is willing to pass from A to 
a significant subset of A, then one can in fact control the rank by a logarithmic 
function of the doubling constant. 


Theorem 5.20 (Freiman 2” theorem) Let d > 1, and let A be an additive set 
in a vector space V with doubling constant K = o[A] < 2°. Then there exists a 
subset A’ of A with rank(A’) < d such that o[A’] < K and |A'| = ©a,x(\Al). 


See [28] for further discussion, including the dependence of constants in the 
©u,x() notation. 


Proof [28] We fix d and induce on K. For K < 1 the claim is vacuously true. 
Now suppose that K > 1 and that the claim has already been proven for values 
of K < K — e(d, K) for some e(d, K) > 0 which is bounded from below for 
K in any compact interval {1 < K < 2¢ — ô}; if we can prove the claim under 


218 5 Inverse sum set theorems 


such a hypothesis, then the claim follows unconditionally by a standard continuity 
argument (the set of K obeying the theorem is open, closed, and contains 1). 

Fix A, V, K, and let £ = e(d, K) be chosen later. If there exists a set A” C A 
with |A”| > e/K|A|ando[A”] < K — e, then the claim would follow by applying 
the induction hypothesis with A replaced by A” and K by K — £. Thus we may 
assume that o[A”] > K — e whenever |A”| < ¢/K|A|. In particular we see that 


|2A”| > K|A"| — £|A| for all non-empty A” C A (5.13) 


(treating the case of small A” and large A” separately). Note that this also holds 
with A” = Ø if we adopt the convention that 2A” = Ø in this case. 

Let r = rank(A). Without loss of generality we may assume that V is r- 
dimensional, since otherwise we can restrict V to the affine span of A (and translate 
to the origin). If A is small, say |A| < 10K, then the claim follows just by setting 
A’ to be a single point, so assume |A| > 10K”. By Lemma 5.13 we conclude 
r < K. We will in fact show that the hypotheses on A force r < d, at which point 
we can take A’ := A and be done. 

We now claim that (5.13) implies the bound 


IAN W| = O(elAl) (5.14) 


for all affine hyperplanes W in V. To see this, observe that W divides V into the 
hyperplane W and two open half-spaces W_, W}. Since A has full rank, at least 
one of AU W,, A U W_ is non-empty. Let us say that A U W, is non-empty. Let 
a be a point in A U W} that minimizes the distance to W. One then observes from 
the convexity and disjointness of W, W_, W, that the midpoint sets 5 -2(A N W), 
5 -2(A U Wy), 5 - (a + (A N W)), and 5 -(2(A U W_)) are all disjoint. Since all 
these sets are contained in 5 - 2A, we see that 


IXA N W)| + [2(A U Wy)| + [AN WI + [2(A U W_)| < [2A] = KIAI. 


Applying (5.13) we conclude (5.14). 
Next, we apply the Freiman cube lemma to obtain a r-parallelepiped P with 
corners in A such that 


|AN P| = Qx(JAl). (5.15) 


Comparing this with (5.14) we see that P cannot be contained in a affine hyperplane 
(if e is chosen sufficiently small). Since the parallelepiped P has 2r < 2K faces, 
each of which lies on an affine hyperplane, we thus see that, with int( P) denoting 
the interior of P, then 


|ANint(P)| > [AN P| — O(Ke|A)). 


5.2 Sum sets in vector spaces 219 


If Q denotes the 2” corners of P, we observe that the sets {x + int(P): x € Q} 
are all disjoint; thus 


IAN P)| > JA Nint(P)| > JAN P| — O(2’ KelA)). (5.16) 


The complement V\P of P in V can be partitioned into at 2r (unbounded) 
convex regions Bı U---U Bo, (Exercise 5.2.4). Observe from convexity and dis- 
jointness that the midpoint sets 5 - 2(A N B;) are disjoint from each other and from 
2(A N P). Thus 

ae 
I2A| > ŽL IXA A B;)| + IXA N PII. 


j=l 
Applying (5.14) we conclude 
IXA N P)| < K|AN P| + 2reA|. 
Combining this with (5.16), (5.15) and using the bound r < K we see that 
2” < K + Ox(e). 


By choosing ¢ sufficiently small depending on K < 24 and d we obtain r < d as 
desired. 














Exercises 


5.2.1 [118] Show that Lemma 5.13 is still true if A + A is replaced by A — A. 

5.2.2 Letd>1,B:= {0, 1}¢ cC Rf, and A be an additive subset of the convex 
hull of B (i.e. A lies in the solid unit cube {(x1, ..., x4): 0 < x1, ..., X4 L 
1}. Show that 


|A + B| > (v2 — 04-s00(1))“|Al. 


(Hint: reduce to the case where A is a subset of B, and then reduce 
further to the case where A consists of elements (n1, ..., nqa) € {0, 1}¢ 
where nı + --- + na is fixed. Then restrict the elements of B in a similar 
manner and apply the covering principle and Stirling’s formula (1.52). 
You may find working out the counterexample in the next exercise to be 


helpful.) 

5.2.3 Show that the quantity /2 in Exercise 5.2.2 cannot be improved, by 
setting A equal to those elements (n1, ..., ng) E€ B such that nı +---+ 
ng = LÉ]. 


5.2.4 Let V be an r-dimensional vector space, and let P be a r-parallelepiped 
in V which is not contained in any hyperplane. Show that V\P is the 
union of 2r unbounded convex regions (not necessarily open). 


220 5 Inverse sum set theorems 


5.3 Freiman homomorphisms 


We now introduce the fundamental concept of a Freiman homomorphism, that 
allows us to transfer an additive problem in one group Z to another group Z’ ina way 
which is more flexible than the usual algebraic notion of group homomorphism. 
Roughly speaking, the role of Freiman homomorphisms is to additive sets as group 
homomorphisms are to additive groups. To avoid confusion we shall often write 
additive sets A more fully as (A, Z), where Z is the ambient group of A. 


Definition 5.21 (Freiman homomorphisms) Let k > 1, and let A, B be additive 
sets with ambient groups Z and W respectively. A Freiman homomorphism of order 
k @ from (A, Z) to (B, W) (or more succinctly from A to B)isamap@: A —> B 
with the property that 


a+ +a, =a, +...t+a, => ba) +--+ + O(a) = O(a) +--+ + O(a) 


for all a1, .. . , ax, a, ..., a. If in addition there is an inverse map o!':BOA 
which is a Freiman homomorphism of order k from (B, W) to (A, Z), then we 
say that @ is a Freiman isomorphism of order k, and that (A, Z) and (B, W) are 
Freiman isomorphic of order k. 


For an equivalent characterization of a Freiman isomorphism, see Exer- 
cise 5.3.1. 

It is easy to verify that a Freiman homomorphism of order k will also be 
Freiman homomorphic of all orders k’ < k. Of course it is the k > 2 cases that 
are interesting; any map from A to B will be Freiman homomorphic of order 1, 
and any bijection will be Freiman isomorphic of order 1. Also, the identity map 
id from (A, Z) to (A, Z) is always a Freiman isomorphism of any order, and the 
composition of two Freiman homomorphisms (resp. isomorphisms) of order k is 
another Freiman homomorphism (resp. isomorphism) of order k; in particular, the 
relation of being Freiman isomorphic of order k is an equivalence relation. Thus 
the class of additive sets, and the Freiman homomorphisms of a fixed order k 
between them, form a category. 


Remark 5.22 We digress to give an analogy with the differential geometry of man- 
ifolds. Manifolds can either be viewed extrinsically (embedded inside an ambient 
space such as a Euclidean space R®) or intrinsically (as a set endowed with certain 
structures such as a topology, Riemannian metric, etc.). One can easily get from 
the former viewpoint to the latter by restricting certain structures of the ambient 
space to the embedded set; reversing this procedure and embedding an intrin- 
sic manifold inside a given ambient space is often much harder. Throughout this 
book we have taken the extrinsic approach, embedding the additive set A inside 
an ambient group Z. However one could also take a purely intrinsic viewpoint, 


5.3 Freiman homomorphisms 221 


fixing the order k of the Freiman homomorphism and viewing the additive set as 
(A, ~k), where A is now thought of an abstract set (rather than a subset of an 
additive group) and ~; is the equivalence relation on A‘ defined (extrinsically) by 
setting (a1,..., ak) ~k (a), ..., a) if and only if a) +--+ + ag =a, +--+ + aq. 
This is still enough to develop the theory of Freiman homomorphism and isomor- 
phisms, and one can define notions such as sum sets, additive energy, etc. in this 
intrinsic setting. However there do not appear to be any major advantages with 
this approach, especially since the embedding problem turns out to be relatively 
easy to solve (in contrast with the situation for, say, Riemannian manifolds). See 
Exercise 5.5.6 below. 


We now give some examples of Freiman homomorphisms. 


If : Z > Z' is a group homomorphism (resp. isomorphism) from one group 
Z to another Z’, then it induces a Freiman homomorphism (resp. isomorphism) 
from (A, Z) to (@(Z), Z’) of arbitrary order. In particular, the reflection map 
$ : Z — Z defined by $(x) := —x is a Freiman isomorphism from (A, Z) to 
(—A, Z) of arbitrary order. 

If (A, Z) and (B, W) are two additive sets such that Z C W and A C B, then 
the inclusion map ı : A — B is a (rather trivial) Freiman homomorphism of 
arbitrary order. Thus, if @ : (B, W) > (B’, W’) is any Freiman 
homomorphism, then the restriction |, : (A, Z) > (B’, W’) will be a 
Freiman homomorphism of the same order. 

If x € Z, then the translation map ¢ : Z —> Z defined by ¢(y) := y+ x isa 
Freiman isomorphism from (A, Z) to (A + x, Z) of any order. 

Let N, M > 1 be integers. Let ¢ : Z — Zy be the canonical quotient 
homomorphism, and let w : [0, N) > ¢([0, N)) be the restriction of ¢ to 

[0, N). Then W is a Freiman homomorphism of any order. But y is only a 
Freiman isomorphism of order k when M > KN, in which case yT! is also a 
Freiman isomorphism. Thus it is possible to have a Freiman isomorphism 


between a set in a torsion-free group and a set in a torsion group, which would 

be impossible if one were only considering group homomorphisms. 

Let a, r be elements of an additive group Z, and let P := a + [0, N) -r be the 

arithmetic progression P = {a,a+r,...,a+(N — 1)r}. Then the map 

ġ : [0, N) > P defined by ġ(n) := a + nr is a Freiman homomorphism from 

([0, N), Z) to (P, Z) of any order. It is a Freiman isomorphism of order k if and 

only if ord(r) > kN. In particular, if r is non-zero and Z is torsion-free, then @ 

is a Freiman isomorphism of all orders. 

e Let N, M,d > 1 be integers, and let ¢ : Z4 —> Z be the map (ai, ..., aq) := 
S a;M/~'. Then the map ¢ is a Freiman homomorphism from [0, N Xf to 


222 5 Inverse sum set theorems 


o({0, N)*) of any order, and is a Freiman isomorphism of order k when 
M > kN. 

e The sets {0, 1, 10, 11} and {0, 1, 100, 101} in Z are Freiman isomorphic of 
order k for any k < 10, but are not Freiman isomorphic of order k for any 
k > 10. 


The relevance of Freiman homomorphisms to the theory of sum sets lies in the 
following lemma: 


Lemma 5.23 Let (A, G) be an additive set, and let ¢ : (A, G) > (P(A), H) bea 
surjective Freiman homomorphism of order k. Then we have 


le;P(Ay) +--+ + Elp] < leyAr +--+ + EAr] 


whenever A,,..., Ap are non-empty subsets of A and &,,...,& = +1. If ọ is 





in fact a Freiman isomorphism of order k, then we may replace inequality with 
equality. In particular, if A and B are Freiman isomorphic of order k, then 


IB — mB| = |lA — mA| whenever l,m > Oandl +m < k. 

Proof Define an equivalence relation ~ on A; x --- x Ag by by declaring 

(a1, ..., ak) ~ (ajs... Qe) > Sra, He + Ekak = Ea, +++ + Eka; 
Observe that the number of equivalence classes in A, x --- x Ax is precisely 
|e; A; +---+6,A,|. Also observe that we can rewrite the above condition 

E141 +++ + bed = £14] +++ + EKA, 
in a positive form as 
EE ee ae aa a, 
j:ej=l jeje jej=l j:£;=— 

From this it is clear that the equivalence relation is respected by any Freiman 
homomorphism of order k. Combining these observations yields the lemma. 














Thus Freiman isomorphisms will preserve the cardinality of iterated sum and 
difference sets (as well as related quantities such as the doubling constant, differ- 
ence constant, and energy); see Exercise 5.3.5. Of course, in many applications 
one wants to take sum sets involving two additive sets A, B in an ambient group 
Z rather than one. One way to resolve this is to work with the union A U B, since 
Lemma 5.23 then shows that Freiman isomorphisms of A U B will preserve the 
cardinality of sets such as A + B or A — B (if the order of the isomorphism is at 
least 2). But this has the slight drawback that one loses the freedom to translate 
A and B independently. One way to get around this is to define the disjoint union 
A W B of A and B, defined in the ambient group Z x Z as 


AWB := (A x {0})U(B x {1}. 


5.3 Freiman homomorphisms 223 


Then any Freiman isomorphism of the disjoint union will preserve sum sets (see 
Exercise 5.3.7). Note that the obvious projection map from AW B to AU Bisa 
Freiman homomorphism of any order. 

Freiman homomorphisms also preserve the property of being a progression: 


Proposition 5.24 Let ¢ : A —> B be a Freiman homomorphism of order at least 
2, and let P = a + [0, N] - v be a progression in A. Then $(P) is a progression in 
B with the same rank, dimensions, and volume as P. Furthermore, if @ is in fact 
a Freiman isomorphism of order at least 2, then @(P) is proper if and only if P is 


proper. 


Proof We may assume that the components N; of N are all strictly positive, since 
if one of the components N; is zero then we can simply remove it and lower the 
rank by 1. By translation invariance we may suppose that the base point a is equal 
to 0, and that @(0) is also zero. In particular P, and thus A, contains all the basis 
vectors U1,..., Vd. 

Since ¢ is a Freiman homomorphism of order 2 and ¢(0) = 0, we see that ø(x + 
vj) = d(x) + (vj) whenever x and x + v; both lie in A and 1 < j < d. Iterat- 
ing this we see from induction that ġ(n - v) = n - ġ(v) for any n € [0, N], where 
o(v) € B® is the d-tuple (v) := (ġ(v1), ..., #(va)). Thus @(P) = [0, N] - ġ(v) 
and is thus a progression with the same rank, dimensions, and volume as P. To 
prove the last part of the proposition, observe that if @ is a Freiman isomorphism 
then |P| = |¢(P)|, and hence | P| = |[0, N]| if and only if |@(P)| = |[0, N]]. 














We now show that torsion-free additive groups are no richer than the integers, 
for the purposes of understanding sums and differences of finite sets. 


Lemma 5.25 Let A be a finite subset of a torsion-free additive group Z. Then for 
any integer k, there is a Freiman isomorphism @ : A — (A) of order k to some 
finite subset @(A) of the integers Z. The same is true if we replace Z by Zn, if N 
is sufficiently large depending on A. 


Note that the converse is trivial: one can always embed the integers in any 
other torsion-free additive group, and hence any additive set in the integers can be 
embedded in any other torsion-free additive group such as R?. However, many of 
these embeddings are trivial, living in some subspace of R?. The question of the 
largest dimension one can “non-trivially” embed an additive set in will lead to the 
concept of Freiman dimension, which we shall study in Section 5.5. 


Proof By Corollary 3.6 we may take Z = Z” for some n > 0. By translating 
A we may assume that A in fact lives in (Z*)", i.e. all the coordinates are non- 
negative. Since A is finite, we see that A is a subset of [0, M/k)" for some large 


224 5 Inverse sum set theorems 


integer M (a multiple of k). Now define the map ¢: A —> Z by 
lai, --., an) = a1 +M +a3M? +--+ +a,M"". 


In other words, we view elements of A as digit strings of integers base M. This is 
a Freiman isomorphism of order k (with ¢, being defined the same way as ġ, but 
restricted to kA); the point is that if M is large enough we never have to “carry” a 
digit. This shows that we can map A to the integers via a Freiman isomorphism; 
the same argument shows that we can map to Z/(N -D if N > M”. 














As we shall see later, the machinery of Freiman homomorphisms and Freiman 
isomorphisms will also be very useful when dealing with torsion groups, for 
instance we can use it to pass from a problem on the integers to a problem on 
a cyclic group or vice versa. If one is willing to only work with a fixed fraction of 
an additive set A, then the following compression lemma allows one to work in a 
cyclic group whose order is only a little bit larger than that of A itself. 


Lemma 5.26 /295] Let A be an additive set whose ambient group Z is either 
torsion-free or a cyclic group of prime order, and let n > 1 be a positive integer. 
Let N be an integer such that 


2n|[nA —nA| < N < |Z| 


(note the condition N < |Z| is vacuous if Z is torsion-free). Then there exists a 
subset A’ C A of cardinality | A'| > |A|/n and a Freiman isomorphism 1 : A’ > 
B from A' to a subset B C Zy of order n. 


Proof By Lemma 5.25 it suffices to consider the case where Z is a cyclic group 
Zp of prime order. 

We shall use the first moment method. Let à € Z,\{0} be an invertible element 
of Z, chosen uniformly at random. The map x }> A - x is thus an additive group 
isomorphism on Z, and is in particular a Freiman isomorphism on Z, of all orders. 
This freedom to dilate A by an arbitrary amount will be needed to avoid a certain 
“collision” problem which will become apparent shortly. 

We now define the projection z : Zp —> Zy by setting 


(m) := Um) mod N, 


where t : Z, — [0, p) is the obvious map that sends the residue class m + (p - Z) 
tom form =0,...,p—1. 

The map z is not quite an additive homomorphism; however note, for j = 
0, 1,..., — 1, that x is a Freiman homomorphism of order n when restricted to 
the set Z; := (jp/n, (j + 1)p/n], which is a set that occupies roughly 1 of the 
original field Z,. By the pigeonhole principle, for each i, there exists a 0 < j = 


5.3 Freiman homomorphisms 225 


j(A) < 8 such that the set A’ := 2-AN Z; has cardinality |A’| > |A|/n. Thus 
if we set B := 2(A’) C Zy, then the map z : A’ > B is a surjective Freiman 
homomorphism of order n. 

We are almost done; however we have not established that m is a Freiman 
isomorphism. The only possible obstruction is that there may be collisions inn A’, 
in the sense that 


W(x) ++ +n) = 1K) +--- + aa) 


while xy +--+ + Xn Ax; +- + x), for some x1,...,%n,X},...,%, E€ A’ Fortu- 
nately, this type of collision rarely occurs, if N is large enough and A is chosen 
randomly. Indeed, if we do have the above collision, then we see that 


(x1) +++ + (Xn) — (Uy) + H 


must be a non-zero multiple of N. Since x1, ...,Xn,X},...,%, lie in A’, and hence 
in AA, we thus see that a collision can only occur if nu(AA) — nu(A A) contains a 
non-zero multiple of N. However, we can compute the probability that this occurs: 


P(ak € Z\0: kN € nu(AA) — ni(AA)) 


< >) PN € nA) — mA) 
k|<np/N;k4~0 


IA 


PAN+ p-ZenidA-—ndA) 
k|<np/N;k40 

= > y P(kN = Ax mod p) 
k|<np/N;k#0 xenA—nA 

= > 5 P(A = (kN)"!x mod p) 
k|<np/N;k#0 xenA—nA 

1 


—1 


IA 








k\|<np/N;k40 xenA—nA P 
1 
EEN 

where we have used the fact that p is prime (to invert kN modulo p). By our 
hypotheses on N we thus see that this probability is strictly less than 1. Thus we 
may choose A so that x : A’ —> B will be a Freiman isomorphism of order n as 
claimed. 


A 





2 
"P inA —nAl 
N Pp 














The above argument should be compared with the proof of Theorem 1.3. 


Exercises 


5.3.1 Let : A > B be a map between two additive sets, and let k > 1. Show 
that @ is a Freiman isomorphism of order k if and only if ¢ is surjective 


226 


5 Inverse sum set theorems 


and 
ay + +++ taps ay + +a, => $a) +--+ + O(a) =O(aj)+ ++ +9(G,) 


for all ay,..., ay, a}, ..., ap € A. 

[257] Let n > 1. Show that {0, 1,n + 1} is Freiman isomorphic to 
{0, 1, n} of order n but not n + 1. 

Show that given any k > 1 and any additive set A, that A is Freiman 
isomorphic of order k to some subset of a finite abelian group. 

Let (A, Z) and (B, W) be additive sets, and let ¢: A > B be a map 
which is a Freiman homomorphism of any order k. Suppose also that 
Z is the group generated by A. Show that there exists a unique group 
homomorphism y : Z — W and an element c € Z’ such that (x) = 
w(x) +c forall x € A. 

Let (A, Z) and (B, W) be Freiman isomorphic of order at least 2. Show 
that o[A] = o[B], that 6[A] = 6[B], and that E(A, A) = E(B, B). For 
any a € R, show that |Sym,(A)| = |Sym,(B)|. (See Definitions 2.4, 2.8, 
2.32 for the meanings of these terms.) 

Let (A, Z) and (B, W) be additive sets which contain the origin 0, and 
let 6 : (A, Z) — (B, W) be a Freiman isomorphism of order at least 3 
which fixes the origin, thus ¢(0) = 0. Show that for any K > 1, that A 
is a K-approximate group if and only if B is. Show that if one replaces 
“K-approximate group” by “translate of a K-approximate group” then 
one can drop the requirement that @(0) = 0 and that A, B contain 0. 
Let (A, Z), (B, Z), (A’, Z’), (B’, Z’) be additive sets, and suppose that 
b: AW B — A'W B is a Freiman isomorphism of order k which maps 
A to A’ and B to B’. Show that |n; A — n2A + n3B — n4B| = |n A’ — 
n A’ + n3B' — n4B’ | whenever || + |n2| + |n3| + |n4| < k. If k > 2, 
show that d(A, B) = d(A’, B’) and E(A, B) = E(A’, B’). Also, show 
that A can be covered by K translates of B if and only if A’ can be 
covered by K’ translates of B’. 

Suppose that two additive sets A and B are Freiman isomorphic of order 
k. If n,m, k’ > 0 are such that k’(n + m) < k, show that nA — mA and 
nB — mB are Freiman isomorphic of order K’. 

Show that all Sidon sets of a fixed cardinality N are Freiman isomorphic 
of order 2 to each other. More generally, for any h > 2, show that all B, 
sets of cardinality N are Freiman isomorphic to each other of order h, 
and that the image of a B, set under a Freiman isomorphism is still a 
B, set. Thus one could work with a “standard” Bp, set of order N, such 
as the basis e1, ..., en of Z”, and many additive results concerning that 
standard set would automatically transfer over to an arbitrary Bp set. 


5.3.10 


5.3.11 


5.3.12 


5.3.13 


5.3.14 


5.4 Torsion and torsion-free inverse theorems 227 


Let (A, Z) and (A’, Z’) be additive sets in finite additive groups Z, Z’ 
which are Freiman isomorphic of order h for some h > 1. Show that 
|All aca = IA’ Ilan), Where the A(p) constants are as in Definition 4.26. 
[29] Let p be a prime, let k > 1, and let (A, Z,) be an additive set in 
Z, such that |A| < log,, p. Show that there exists an additive set (A’, Z) 
such that the canonical projection map from Z to Z, is a Freiman iso- 
morphism of order k from A’ to A. (Hint: the claim is obvious if A is 
contained in the arithmetic progression [—p/2k, p/2k]- 1 in Zp. For the 
general case, use the Kronecker approximation theorem (Corollary 3.25) 
to locate an integer n coprime to p such that n - A lies in this progression 
[—p/2k, p/2k]- 1, and then find an integer m with nm = 1 (mod p) to 
“invert” the dilation x > n- x.) 

[29] Let p be a prime, written in binary as p = 2™ + ----+ 2” where 
nı <+- < n,. Let (A, Zp) be the additive set 


A := {0} U {1,2}, ..., OO 4. ON eet Pep 


Show that |A| < 2log, p + 1, but there does not exist any set of integers 
A’ which is Freiman isomorphic of order 2 to A. This shows that the 
estimate |A| < log,, p in Exercise 5.3.11 is very close to being sharp. 
Let (A, Z), (B, Z) be additive sets such that A + B can be covered by 
K translates of A for some K > 1, and let¢@ : A W B — C bea Freiman 
homomorphism of order 4. Show that ¢(A) + 6(B) can be covered by K 
translates of f(A). 

Let Q be a progression of rank d, letk > 1, and let N > k@|Q|. Show that 
there exists an additive set (Q’, Zn) inthe cyclic group Zy anda surjective 
Freiman homomorphism ¢ : Q’ —> Q of order k. If Q is proper, one can 
also ensure that ¢ is injective. This fact is useful for viewing progressions 
as dense subsets of cyclic groups. 


5.4 Torsion and torsion-free inverse theorems 


We can now use all the machinery developed thus far to prove two inverse sum set 


theorems, one in the setting of r-torsion groups and one in the setting of torsion- 
free groups. The two arguments are quite different, but they will be combined to 
obtain an inverse sum set theorem for an arbitrary group in Section 5.6. 

We begin with the r-torsion case. 


Theorem 5.27 (Freiman theorem for r-torsion groups) [300], [154] Suppose 
A is an additive set in an r-torsion group Z such that |A + A| < K|A| or 


228 5 Inverse sum set theorems 


|A — A| < K|A|. Then there exists a subgroup H of Z of cardinality |A| < |H| < 


od) . . . 
r£ |A] such that A is contained in a translate of H. 


Proof By Proposition 2.26 we can find a K °™® -approximate group H such that 
A is contained in a translate of H. But then H + H C H + X for some additive 
set X of cardinality at most K?“. We conclude that the set G := H + (X) isa 
genuine group, where (X) is the group generated by X. But from the r-torsion 
hypothesis we have |(X)| < r!X¥! < rx°®”, and the claim follows. 

















Remark 5.28 The upper bound on |G| has been improved to r2<’! in [154], 
using the Green—Ruzsa covering lemma and the Pliinnecke inequalities; see Exer- 
cise 5.4.1. The exponential dependence in K here is necessary, as the example 
Z= Zz. A = {e,...,ex} shows. However if one relaxes the claim that A is 
completely contained in a translate of H then one should do better. For instance, 
it is conjectured by Marton [300] that in the above setting we can in fact find a 
group H C Z of cardinality at most |A| such that A can be covered by O(K 2") 
translates of H. This would be sharp up to polynomial losses, since in that case 
one can easily verify that |A + A|, |A — A| = O(K®®]A]). 

As a corollary we can also obtain a Chang-type theorem in the r-torsion case. 
Corollary 5.29 (Chang theorem for r-torsion groups) Suppose A is an addi- 


tive set in an r-torsion group Z such that E(A, A) > |A|>/K. Then 2A — 2A 
contains a subgroup of Z of cardinality at least 7 OKO) AL, 


Proof We may take r > 2 as the case r= 1 is trivial. Using the Balog- 
Szemerédi—Gowers theorem (Theorem 2.31) and translating A if necessary, we 
may find a subset A’ of A with |A’| = Q(K~?|A]) which is contained in a K 0“- 
approximate group G of size |G| = O(K?®]A|). Using Theorem 5.27 we may 
place the approximate group G inside a genuine group H of cardinality at most 
Ke) Als thus Py(A’) > r-*°”. By Proposition 4.39, we thus see that 2A’ — 2A’ 
contains a Bohr set Bohry(Spec,(A’), 1) for some a = Q(K~?)., Using 
Lemma 4.36 as in the proof of Theorem 4.42, we conclude that 2A’ — 2A’ (and 
hence 2A — 2A) contains a Bohr set Bohry(S, ag) for some set of frequencies 
S C H with |S| = O(K°), In particular, it contains the subgroup Bohr y (S, 0). 
But as H is an r-torsion group, Bohry(S, 0) = Bohry(S, 1/7), and so from (4.25) 
we see that 


|Bohry(S, O)| =r Oe?) 


> pO AN 
= (rR) K-00] A) 


and the claim follows (using the hypothesis r > 2 to absorb the lower order terms). 














5.4 Torsion and torsion-free inverse theorems 229 


We now turn to the torsion-free case. We begin with two preliminary results of 
interest in their own right. The first exploits all the above machinery of Freiman 
homomorphisms, as well as the powerful techniques of harmonic analysis from 
Chapter 4 and the additive geometry results in Chapter 3 (as encapsulated in 
Theorem 4.42), to show that if A has small doubling, then 2A — 2A contains a 
large proper progression. 


Theorem 5.30 (Ruzsa—Chang theorem) [295], [48] Let A be an additive set in 
a torsion-free additive group Z such that |A + A| < K|A| for some K > 1. Then 
2A — 2A contains a proper symmetric progression P of rank O(K(1 + log K)) 
such that |P| > e~ 0K (+8? K))) Aj, 


Proof Let p be the first prime number larger than 16|8A — 8A]. By Corollary 2.23 
and Bertrand’s postulate (Exercise 1.10.3) one can then find a subset A’ of A of 
cardinality |A’| > |A|/8, which is Freiman isomorphic of order 8 to an additive 
set B in Z,. Observe that 


|B + B| = |A + A'| < |A + A| < KIA| < 8K|B| 


so B has doubling constant at most 8K. Applying Theorem 4.42 we then obtain a 
proper symmetric progression Q inside 2B — 2B of rank at most O(K (1 + log K)) 
and cardinality at least O(K (1 + log K))~?*(+’ K))| B|, In particular we have 


—O(K log? K 
IQ] > Ee O B], 


Since A’ is Freiman isomorphic to B of order 8, 2A’ — 2A’ is Freiman isomor- 
phic to 2B — 2B of order 2 (see Exercise 5.3.8). 2A — 2A, contains a symmetric 
progression P which is Freiman isomorphic to P, and the claim follows. 














The second result is a variant of the Ruzsa covering lemma which gives good 
constants when the doubling constant is small. 


Lemma 5.31 (Chang’s covering lemma) [48] Let K, K' > 1, and let A, B be 
additive sets in an ambient group Z such that |nA| < K"|A| for all n > 1, 
and such that |A + B| < K'|B|. Then, for any ao € A, there exists elements 
Vj,...,U¢ in A—A with d=2K(1+log,(KK’)) such that AC B- B+ 
(0, 1]? - (vi, ..., vg) + ao. 


Proof Without loss of generality we may take K to be an integer. By translation 
we may take ay = 0. We construct a sequence of enlargements B = By C B; C 
--- C By by iterating the argument of Lemma 2.14 as follows. Set Bo := B. Now 
suppose inductively that n > 0 and B, has already been constructed. Consider the 
collection {a + B, : a € A} of translates of B, by elements of A. If we can find at 
least 2K such translates which are disjoint, we set B„+1 to be the union of these 2K 


230 5 Inverse sum set theorems 


translates; thus B,;; = Bn + A, for some subset A, of A of cardinality 2K, and 
|Bn4il = 2K|B,|, and then continue the algorithm. If we cannot find 2K disjoint 
translates, we select a family of disjoint translates of maximal cardinality, set B,+1 
to be the union of these translates, and then halt the algorithm setting N := n + 1. 
Thus in the terminating case we have B,4; = Ba + An, where A, is a subset of A 
of cardinality less than 2K. 

Let us first see why this algorithm even terminates. By induction we see that 
B, C B +nA forall0 < n < N, but we also have |B,,| = (2K)"|B|. On the other 
hand, from Lemma 2.6, we have 

|B+nAl < [B= AAAA SKK. 
|A] 
Thus the algorithm must terminate by the time (2K)” exceeds K'K”*!, and we 
therefore have the bound N < 1 + log,(K K’). 

Now let a be any element of A. Observe that By_; + a cannot be disjoint from 
By, since otherwise we could have added it to the collection of disjoint translates 
comprising By. Thus a € By — By-_, for alla € A, and hence 


A C By — By-1 = B — B + Ao — Ao + A1 — At +- + Ay- — An, + Ay. 


By Lemma 3.11, we see that each of the A; (or —A;) can be contained in a 
progression of the form [0, 1]% - v for some d į < 2K, where the components of v 
lie in A; and hence in A — A (since 0 € A and A; C A). The claim then follows 
from several applications of (3.2). 














As a consequence of these two results we obtain an inverse theorem in the 
torsion-free case. 


Theorem 5.32 (Freiman’s theorem for torsion-free groups) ///6/, [295], 
[48] Let A be an additive set in a torsion-free group Z such that 
|A+ A|< K|A|. Let ag € A. Then there exists a proper progression P 
contained in 2A — 2A of rank at most O(K(1 + log K)) and cardinality at 
most |P| <|2A—2A| < K?|A|, and vectors vı,...,va in 4A —4A with 
d = O(K?®), such that A C P + [0, 1]f - (vi, ..., vg) + ao. 


Proof By translation we may assume that a = 0, so 0 € A. Applying Theo- 
rem 5.30 we see that 2A — 2A contains a proper progression P of rank at most 
CK(1 + log K) and cardinality at least e70 0+8 KV] A|, Note from Corollary 
2.23 that |P| < |2A —2A| < K?|A]. Now we use Lemma 5.31 to cover A by 
P — P. First from Corollary 2.23 note that 


|A + P| < [3A — 2A] < KIA] < eK CHB KD) p] 


5.4 Torsion and torsion-free inverse theorems 231 


and that |nA| < K°|A| for all n > 1. Thus by Lemma 5.31 (and the remarks 
immediately following that lemma) we have 


AC P-—P+(0, 1}*-(u,..., va) 


forsome vı, ..., va E A — Aandd = O(K?}™®). Also, from Lemma 3.10 we have 
P— PC P +[0, 1]? - (Ww 1,..., Wa’) Where d' = O(K (1 + log K)) is the rank of 
P and wj,..., w} E€ P— P C 4A — 4A. Combining these facts using (3.2) we 
obtain the result. 














One can reduce the rank of the containing progression to K — 1, at the cost of 
worsening the size of | P|: 


Theorem 5.33 [48] Let A be an additive set in a torsion-free group Z such that 
|A + A| < K|A|. Then there exists a proper progression P of rank at most K — 1 
which contains A such that | P| < exp(O(K?“))| Al. 


Proof We may assume that |A| < 100K? (for instance) since the claim follows 
from Lemma 3.11 and Theorem 3.40 otherwise. 

Without loss of generality we may assume that A contains the origin, and then 
we may assume that Z is generated by A otherwise we could pass from Z to the 
group (A) generated by A. From Theorem 5.32 and (3.2) we can contain A inside a 
progression Q of rank d = O(K °”) and cardinality at most exp(O(K ?)))| A]. 
Now consider the progression 2Q — 2Q, which has the same rank as Q and essen- 
tially the same bounds on the cardinality. By Theorem 3.40 we can find a symmetric 
proper progression R = [—N, N] - v of some rank d’ < d containing 20 — 2Q 
such that |R| < exp(O(K?))|A|. In particular, the set A (which is contained 
inside Q — Q) is Freiman isomorphic of order 2 to a subset A of [—N, N] C Zz. 
thus A has doubling constant at most K. By Freiman’s lemma (Lemma 5.13) we 
may place A in a subspace V of Z of dimension at most K — 1. 

We now use the “rank reduction argument”. If d’ < K — 1 then we are done (by 
setting P = R), so suppose d’ > K — 1. The intersection of [-N, N] C Z” with 
V is the intersection of a convex subset with a lattice of rank strictly less than d’ with 
cardinality at most exp(O(K ?“)))| A, so by Lemma 3.36 we may contain it in a 
progression of rank strictly less than d’ and cardinality at most exp(O(K ?“)))| A], 
with steps inside [—N, N]. Using the Freiman isomorphism, this allows us to 
contain A in a progression Q’ of rank strictly less than d and cardinality at most 
exp(O(K °)))|A|. We then iterate the above argument (replacing Q by Q’) at 
most d times until one can contain A in a progression P of length K — 1. As the 
rank decreases at each stage it is easy to see that the final progression P will have 
size at most exp(O(K ?)), 














232 5 Inverse sum set theorems 


The exponential factors in Theorem 5.33 cannot be removed directly, as can 
be seen by considering the additive set Z = {e),..., ex} in Z* . However it is 
conjectured that if one weakens the containment A C P then one can do better, 
for instance 


Conjecture 5.34 (Polynomial Freiman—Ruzsa conjecture) Let A be an addi- 
tive set in a torsion-free group Z such that |A + A| < K|A|. Then there exists 
a progression P of rank at most O(K°™) such that |P| = O(K°|A]) and 
JAN P| = Q(K~-9® | A)). 


This would be the analog of Marton’s conjecture mentioned earlier in this 
section. Such a conjecture, if true, would allow one to obtain substantially better 
bounds on many results whose proof involves Freiman’s theorem. See [151], [152] 
for further discussion. 

By combining Theorem 5.33 with Theorem 5.20 one can show 


Proposition 5.35 [28] Let A be an additive set in a torsion-free group Z such 
that |A + A| < K|A| for some K < 2°. Then there exists a proper progression P 
of rank at most d and size |P| = ©x,q(\A|) such that |A N P| = ©x,a(|Al). 


We leave the deduction of this proposition from the previous results to Exer- 
cise 5.4.5. Recently, a more quantitative version of this proposition was obtained: 


Proposition 5.36 [162] Let A be an additive set in a torsion-free group Z such 
that |A + A| < K|A|. Then for any 0 < € < 1 there exists a proper progression 
P of rank at most |log, K + £] and size at most |A| such that A is covered by 
exp(O(K? log? K))/e°? translates of P. 


Exercises 
K o(1) 


5.4.1 [154] Using Lemma 2.17 and Corollary 6.28, improve the factor of r 
in Theorem 5.27 to r?%-1, 

5.4.2 Show that the term (d + 1)|A| — dD in Corollary 5.13 cannot be 
replaced by any smaller quantity. 

5.4.3 Using Corollary 6.28, improve the bounds in Theorem 5.32 and Theorem 
5.33 as much as you can. 

5.4.4 [300], [151] Let Z, Z’ be two r-torsion groups, let K > 1,andlet f : Z > 
Z’ be a function which is a “K -almost homomorphism” in the sense that 
the set {f(x + y) — f(x) — f(y): x, y € Z} has cardinality at most K. 
Show that there exists a genuine group homomorphism g : Z —> Z’ such 
at { f(x) — g(x) : x € Z} has cardinality at most r* . It is conjectured that 
one can improve r* to O,(K 2); this would essentially imply Marton’s 
conjecture. See [151], [152] for further discussion. 


5.5 Universal ambient groups 233 


5.4.5 Prove Proposition 5.35. (In addition to Theorem 5.33 and Theorem 5.20, 
you may use the rank reduction argument as in the proof of Theorem 5.33.) 

5.4.6 Let A be a bounded non-empty open set in R? such that mes(A + A) < 
Kmes(A). Show that K > 27, and that one has the containment A € B + 
P, where B is a ball and P is a progression of rank O(K 2) and volume 
O(exp(K ?“)mes(A)/mes(B)). (Hint: take B to be a ball contained in A. 
Now replace Rf with a lattice adapted to the scale of B.) 


5.5 Universal ambient groups 


In this section we fix the order k of Freiman homomorphisms and isomorphisms, 
and shall frequently omit the phrase “of order k”. 

It is possible for two additive sets to be Freiman isomorphic even though their 
ambient groups are very different. For instance, the additive sets ({1, 2, 3}, Ze), 
({1, 2, 3}, Z7), and ({1, 2, 3}, Z) are all Freiman isomorphic of order 2, despite the 
groups Z6, Z7, Z being different. On the other hand, the additive set ({1, 2, 3}, Z3) 
is not Freiman isomorphic of order 2 to any of the above sets and has a quite 
different additive structure. It is natural to ask whether there is some universal 
ambient group that one can place an additive set in, after Freiman isomorphism. 
To phrase this more precisely, we introduce 


Definition 5.37 (Universal ambient group) Let(A, Z) bean additive set, and let 
the order k of the Freiman homomorphisms be fixed. We say that Z is a universal 
ambient group (of order k) for the additive set A if, every Freiman homomorphism 
d : (A, Z) > (B, W) has a unique extension to a group homomorphism ext : 
Z — W (thus ¢ext(x) = &(x) forall x € A). More generally, we say that an additive 
group Z’ is a universal ambient group for (A, Z) if there exists an additive set 
(A’, Z’) which is Freiman isomorphic to (A, Z) such that Z’ is a universal ambient 
group for A’; we then call (A’, Z’) an embedding of (A, Z) inside the ambient 
group Z’. 


Examples 5.38 Let k = 2, and consider the additive set (A, Z) = ({1, 2, 3}, Z7). 
The group Z; is not a universal ambient group for A = {1, 2, 3}, as can be seen for 
instance by considering the Freiman homomorphism : A —> Z defined by 6(1) = 
1, (2) = 2, (3) = 3. This homomorphism cannot extend to a group homomor- 
phism on Z7, since 1 has order 7 in Z; but has infinite order in Z. Even if one 
replaces the ambient group Z7 with Z, the additive set ({1, 2, 3}, Z) is still not placed 
inside a universal ambient group, because the translation map (x) := x + lisa 
Freiman homomorphism on {1, 2, 3} but does not extend to a group homomorphism 
on Z. On the other hand, the additive set ({(1, 1), (2, 1), (3, 1)}, Z7) is placed 


234 5 Inverse sum set theorems 


inside a universal ambient group, as one can easily verify. But the additive set 
aA, 1, 0), (2, 1, 0), (3, 1, 0}, Z?) is not placed inside a universal ambient group 
for a different reason, namely that the extension of Freiman homomorphisms to 
group homomorphisms is not unique (one has too much freedom to decide what 
to do with the third coordinate). 


As stated, the definition of a universal ambient group is invariant under Freiman 
isomorphism. Also, if an additive set A has two universal ambient groups Z and 
Z’, then they are necessarily group isomorphic (as can be seen by extending the 
obvious Freiman isomorphism between the two associated embeddings of A). Thus 
universal ambient groups, if they exist, are unique up to group isomorphism (for 
fixed k). 

Lev and Konyagin [232] observed that universal ambient groups always exist: 


Theorem 5.39 (Existence of universal ambient groups) /232] Fix k > 2, and 
let (A, Z) be an additive set. Then there exists a universal ambient group Z' for 
A. Furthermore, if A' is an embedding of A inside this ambient group Z', then Z' 
is generated as a group by A’. In particular, Z' is finitely generated. 


Proof Let Z^ be a group of rank |A| which is freely generated by some basis 
{ea : a € A}. Let (X) be the subgroup of Z^ generated by the elements 


1 A 
Xa {ea + +++ F eag — Cat 0 ea | A1, oy Ak, AJ, p Ag E A, a1 t+ +++ Hak 


=a, + + a}. 


We then define Z’ := Z4/(X), and let A’ be the image of the basis {e4 : a € A} 
under the canonical quotient map x : Z4 —> Z4/(X).Itis clear that Z’ is generated 
by A’. We now show that the map: : A — A’ defined by (a) := 7 (ea) is a Freiman 
isomorphism. Since this map is surjective, it suffices by Exercise 5.3.1 to show 
that 


a+- +a =a, tee Ha, SS ay) +--+ + ag) = ay) +++ + UG). 


But this is clear from the construction of Z4/(X). 

Next, let @ : (A’, Z’) > (B, W) be a Freiman homomorphism. Let y : Z4 > 
W be the unique group homomorphism such that w(e,) = (u(a)) for all a € A; 
this is uniquely defined since the basis {e, : a € A} freely generates Z^. Also 
it is clear that w annihilates X, and hence (X). Thus y descends to a group 
homomorphism ext : Z4/(X) —> W, and it is easily verified that ext extends ¢. 
This proves existence of extensions. To prove uniqueness, it suffices to show that 
any two group homomorphisms from Z’ to W which agree on A’ will agree on all 
of Z’. But this follows since Z’ is generated by A’. 














5.5 Universal ambient groups 235 


For an alternative construction of the universal ambient group, see Exer- 
cise 5.5.1. For some examples of universal ambient groups, see Exercise 5.5.1 
and Exercise 5.5.17. 

If (A, Z) is an additive set with universal ambient group Z, then we can define 
a degree map deg : Z —> Z to be the group homomorphism extending the trivial 
Freiman homomorphism a +> 1. Thus deg equals 1 on A, equals 2 on 2A, and 
more generally equals l — m on LA — mA. Thus in the universal ambient group 
the sets nA for n € Z are all disjoint. Also observe that deg must annihilate the 
torsion group Tor(Z) := {x € Z : nx = 0 for some n € Z+} of Z, since the range 
Z of deg is torsion-free. This shows that Z/Tor(Z) is a non-trivial torsion-free 
additive group, and hence by Corollary 3.6 is group isomorphic to Z^+! for some 
d > 0. Since all universal ambient groups are group isomorphic, this quantity d 
depends only on the additive set A, and we give it a name: 


Definition 5.40 (Freiman dimension) Let A be an additive set. We define the 
Freiman dimension of A to be the unique non-negative integer dim(A) = d such 
that Z/Tor(Z) is group isomorphic to Z+! for every universal ambient group Z 
of A. 


Note that the Freiman dimension depends on the choice k of the order of 
Freiman homomorphism; see Exercise 5.5.11. Traditionally one works with the 
Freiman dimension corresponding to the case k = 2. We caution that Freiman 
dimension is not monotone; again, see Exercise 5.5.11. The Freiman dimension 
can be interpreted as the largest rank that is attainable by a Freiman isomorphic 
copy of A in a vector space; see Exercise 5.5.10. 

Let (A, Z) be an additive set with a universal ambient group Z, and let d 
be the Freiman dimension of A, and let Z be a universal ambient group for A. 
Then by Definition 5.40 we may identify Z = Z¢ x Z x Tor(Z); by applying a 
group isomorphism if necessary we may assume that the degree map deg : Z > 
Z corresponds to the Z coordinate of this identification, thus deg(n, m, x) = m 
for all n € Zf, m € Z, x € Tor(Z). Now let x : Z —> Zf be the projection to the 
first factor. We call the additive set [A] := (r(A), Z?) a torsion-free universal 
representation of A. It is easy to see that the torsion-free universal representation 
[A] of an additive set A is unique up to affine group isomorphisms on Zf (i.e. up 
to translations and elements of SZ,(Z)). Also, since A generates Z, we see that 
zr(A) must generate Z4+!, which implies that Z’ lies in the affine span of [A]. In 
other words, rank([A]) = d. 

Note that 2 induces a surjective Freiman homomorphism from A to [A]. If Z 
has no torsion group then this is in fact a Freiman isomorphism, but in general 
if A contains enough “torsion” then A and [A] will not be Freiman isomorphic; 


236 5 Inverse sum set theorems 


see Exercise 5.5.9. Nevertheless, [A] remains a universal embedding of A in the 
category of embeddings into torsion-free groups. More precisely: 


Proposition 5.41 Let A be an additive set with Freiman dimension d, and let 
[A] C Zf be a torsion-free universal representation of A. Let 1 : A — [A] be 
the associated Freiman homomorphism, and let : A — (A', Z') be any Frieman 
homomorphism into a torsion-free additive group Z'. Then there exists a unique 
vector v = (v,,..., vg) € (Z’)4 and a € Z' such that ¢(b) = a + x (b) - v for all 
b eA. 


Proof We may assume that A is embedded inside a universal ambient group 
Z = R’ x R x Tor(Z), and that [A] = z (A) where x : Z > R‘ is the projection 
to the first factor. On the other hand, ġ extends to a group homomorphism @ext : 
R? x R x Tor(Z) > Z’. Since Z’ is torsion-free, dex; must annihilate Tor(Z), and 
thus ext must take the form Qext(n, m, x) =n-v+m-a foralln € R’,m ER, 
x € Tor(Z), where v € (Z’)? and a € Z. Since A is a subset of Rf x {1} x Z 
and z (n,m, x) = 1, we thus have $(b) = ¢ex(b) = m (b) - v +a for all b € A, as 
desired. 














From this and Freiman’s lemma we can obtain 


Corollary 5.42 Let k > 2, and let A be an additive set in a torsion-free additive 
group Z such that min(|A + A|, |A — Al) < (d + DIA| — “4 for some integer 
K > 1. Then dim(A) < d. 


Proof Let[A] = z (A) be a torsion-free universal representation of A. By Propo- 
sition 5.41 we have a Freiman homomorphism from [A] back to A, and hence A 
and [A] are Freiman isomorphic. Hence we may without loss of generality work 
with [A] instead of A. But then the claim follows from Lemma 5.13 (or Exercise 
5.2.1), since rank([A]) = d. 














Thus, in the torsion-free case at least, sets with small doubling necessarily have 
small Freiman dimension. A slightly weaker statement is true when A is not a 
torsion-free additive group: 


Corollary 5.43 Letk > 2, and let A be an additive set. Then dim(A) < o[A]?™. 


Proof Let K :=o[A]andd := dim(A). If K < 3 then d = 0 (Exercise 5.5.13). 
Hence we may assume K > 3, and it will now suffice to show d = O(K°®). 
Without loss of generality we may assume that A is embedded in a universal 
ambient group Z. From Proposition 2.26 we see that A + A can be covered by 
O(K?®) translates of A. Applying the quotient map x : Z > Z/Tor(Z) = Z% !, 
we then see that z (A) + (A) can be covered by O(K?®) copies of (A), and 


thus |27(A)| < K?|s(A)|. But 2(A) is Freiman isomorphic to a torsion-free 


5.5 Universal ambient groups 237 


universal representation [A] of A; thus |2[A]| < K?|[A]]. On the other hand, 
since rank([A]) = d, we see from Lemma 5.13 that |2[A]| > (d + 1)|[A]] — 
LAD, Since |[A]| > rank(A) + 1 = d + 1 (for instance), we thus have |2[A]| > 
{\[A]| (for instance). Combining this with the upper bound on |2[A]| we obtain 


the result. 














For a refinement of the bounds in this corollary, see Exercise 6.5.18. 


Exercises 


5.5.1 


For any additive sets A, B, let Hom;,(A — B) denote the space of Freiman 
homomorphisms (of order k) from A to B. Since A is an additive set, 
observe that Hom,(A — R/Z) is an additive group which can be viewed 
as a compact subgroup of a torus. In particular it has a Pontryagin dual 
LS Hom,(A > R/Z), defined as the space of all continuous group 
homomorphisms from Hom(A — R/Z) to the circle group R/Z. For any 
a € A, define the Gelfand transform â € Z’ of a by the formula 


a(x) := x(a) for all y e Hom (A > R/Z), 


and let A’ := {â : a € A}. Show that (A’, Z’) is Freiman isomorphic to 
(A, Z), and that Z’ is a universal ambient group for A. 

Let A be an additive set. Show that Z'4!*! is a universal ambient group 
for A if and only if A is a B; set (see Definition 4.27), in which case 
the additive set ({e; + ejaj+1 : 1 < j < |A|}, Z'4+!) is an embedding 
of A into Z!4l+!, Here of course e1,..., ejaj41 is the standard basis 
for Z/4I+1, 

Let A be an additive set, and let x : A > R/Z be a Freiman homomor- 
phism. Let us say that x is infinitely divisible if for every integer n there 
exists a Freiman homomorphism x /n : A — R/Z which, when multi- 
plied by n, yields x. Show that x is infinitely divisible if and only if there 
exists a Freiman homomorphism ¢ : A —> R such that @ mod 1 = x. 
Conclude that the tangent space of the compact group Hom, (A — R/Z) 
at the origin is canonically identifiable with Hom,(A — R). 

Let : A > B bea Freiman homomorphism (resp. isomorphism). Show 
that the map gi : Hom,(B — R/Z) > Hom,(A — R/Z) defined by 
¢'(x) := x o¢ is a group homomorphism (resp. isomorphism). Also, 
if¢@:A— Band w: B — Care Freiman homomorphisms, show that 
(pow) = Wi o pt. Show that the adjoint functor p +> $1 is a bijection 
between Freiman homomorphisms from A to B, and group homomor- 
phisms from Hom,(B — R/Z) to Hom(A —> R/Z). 


238 


5.5.10 


5.5.11 


5 Inverse sum set theorems 


Let G be an additive set which is also an additive group (i.e. G + G = 
G). Show that Hom,(G — R/Z) is canonically identifiable with G x 
(R/Z), where G is the Pontryagin dual of G, i.e. the space of group 
homomorphisms from G to R/Z. If A is an additive set contained in 
G, give examples to show that Hom,(A — R/Z) can be much larger 
or much smaller than Hom,(G — R/Z), although Freiman duality will 
convert the inclusion map from A to G to a group homomorphism from 
Hom,(G — R/Z) to Hom,(A — R/Z). 

Let k = 2. Show that the universal ambient group of A = ({1, 2, 3}, Ze) 
(or ({1, 2, 3}, Z7), or ({1, 2, 3}, Z)) is canonically identifiable with L, 
with A being identified with {(1, 1), (2, 1), (3, 1)}. Show on the other 
hand that the universal ambient group of A = ({1, 2, 3}, Z3) is canoni- 
cally identified with Z; x Z, with A identified with {(1, 1), (2, 1), (3, 1)}. 
Show that the universal ambient group of A = ({1,2,4,5}, Z) is 
canonically identifiable with Z°, with A being identified with 
{(0, 0, 1), (0, 1, 1), (1,0, 1), (1, 1, D}. 

Let (A, Z) be an additive set embedded inside a universal ambient group 
Z, let (B, W) be another additive set, let p : A —> B be a Freiman homo- 
morphism, and let ° : Z — W be the group homomorphism extension. 
Show that ¢ is a Freiman isomorphism if and only if the kernel ker(ġ®*) := 
{x € Z : p™(x) = 0} of 6 is disjoint from (kA — k A)\ {0}, or equiva- 
lently if @™' is injective on kA. 

Let (A, Z) be an additive set embedded inside a universal ambient group 
Z, and let G be an additive group. Show that G contains a subset A’ that 
is Freiman isomorphic to A if and only if G contains a subgroup H that 
is group isomorphic to Z/T for some subgroup T° of Z which is disjoint 
from (kA — kA)\{0}. 

Let (A, Z) be an additive set embedded inside a universal ambient group 
Z. Show that A and [A] are Freiman isomorphic if and only if (kA — 
kA) Tor(Z) = {0}. Note from Proposition 5.41 that A can be embedded 
into a torsion-free additive group if and only if A and [A] are Freiman 
isomorphic. 

Let A be an additive set in a torsion-free additive group Z. Show that 
there exists a Freiman-isomorphic copy (A’, V’) of (A, Z) inside a vector 
space V’ such that rank(A’) = dim(A). Furthermore, we have rank(A”) < 
dim(A) for any other Freiman isomorphic copy (A”, V”) of (A, Z) ina 
vector space. 

Let (A, Z) be the additive set ({1, 2, 4, 5}, Z). Show that dim(A) = 4 if 
k = 1, that dim(A) = 2ifk = 2, and dim(A) = 1 fork > 3. In particular, 


5.5.12 


5.5.13 


5.5.14 


5.5.15 


5.5.16 


5.5.17 


5.5.18 


5.6 Freiman’s theorem in an arbitrary group 239 


when k = 2, conclude that dim({1, 2, 4, 5}) > dim({1, 2, 3, 4, 5}), thus 
demonstrating that Freiman dimension is not monotone. 

Show that the Freiman dimension dim(A) = dim,;(A) of an additive set 
is a non-increasing function of k, thus dim,z+;;(A) < dim; (A). 

Let k > 2, and let A be an additive set such that øo [A] < 3. Show that 
dim(A) = 1. (Hint: embed A in a universal ambient group and apply 
Corollary 5.6.) 

Let (A, Z) and (A’, Z’) be additive sets. Show that dim(A @ A’) = 
dim(A) + dim(A’). 

Let 6: A > A’ be a surjective Freiman homomorphism. Show that 
dim(A’) < dim(A). 

Let A be an additive set, and let Z be a universal ambient group for 
A. Show that Tor(Z) = {0} if and only if the group homomorphism 
x : Hom,(A — R) — Hom;(A — R/Z) defined by 1(¢) := @ mod 1 
is surjective, or in other words every Freiman homomorphism from A to 
R/Z lifts up to a Freiman homomorphism from A to R. 

Let k = 2 and consider the set A := {2e1, e1 + €2, 2e2, e2 + €3, 263, €3 + 
e4, 2e4,€4 + e;} in Z4, where e1, e2, e3, e4 is the standard basis; one can 
view this as a generic skew quadrilateral together with the midpoints. 
Show that (A, Z*) has Z* x (Z/2Z) as a universal ambient group. Thus 
it is possible for the universal ambient group to contain some torsion 
even when the additive set can be embedded in a torsion-free additive 
group. Write down an embedding of A in the universal ambient group 
Lx (Z /2Z), and compare it with a torsion-free representation [A] of A; 
are they Freiman isomorphic to each other? 

Generalize Theorem 5.11 to handle additive sets A in any torsion-free 
additive group. 


5.6 Freiman’s theorem in an arbitrary group 


Now we use the universal group, combined with Fourier analysis and additive 
geometry, to obtain Freiman’s theorem in an arbitrary additive group. This result 
was first obtained by Green and Ruzsa [157]; the approach here is inspired by their 
argument but is arranged somewhat differently, relying in particular on volume 
bounds on polar bodies instead of the Ruzsa—Chang theorem (Theorem 5.30), and 
working in the universal ambient group rather than by introducing a sequence of 
successively smaller ambient groups to contain the additive set A. 


240 5 Inverse sum set theorems 


Observe that in some inverse sum set theorems (Corollary 5.6, Theorem 5.27) 
a set with small doubling was contained inside a finite group (or a coset of such a 
group), whereas in other inverse sum set theorems (Theorem 5.11, Theorem 5.32, 
and to a lesser extent Corollary 5.19) a set with small doubling was contained 
inside a progression. In general, it is convenient to place a set of small doubling 
inside a coset progression P + H, which was defined in Definition 4.21. 


Theorem 5.44 (Freiman’s theorem in an arbitrary group) /157] Let K > 1, 
and let (A, Z) be an additive set in an arbitrary group Z such that |A + A| < 
K|A|. Then there a coset progression P + H of rank at most dim(A) such that 
A C P+ Hand |P||H| < exp(O(K?))| Al. If Z is the universal ambient group 
of A, then we can take H = Tor(Z). 


One can make the constants in exp(O(K ?)) more explicit; see [157]. 


Proof Here we shall fix the order k of the Freiman homomorphisms under con- 
sideration to be k = 2. Without loss of generality we may assume Z is the universal 
ambient group; the general case then follows from Definition 5.37 (and the obser- 
vation that the image of a group or progression under a group homomorphism is 
still a group or progression). We write d := dim(A); from Corollary 5.43 we have 
d = O(K°%), 

We know that Z is isomorphic to Z? x Z x Tor(Z); we shall abuse notation and 
identify Z with Z4 x Z x Tor(Z), in particular identifying Tor(Z) with {(0, 0)} x 
Tor(Z). We can also arrange matters so that the Z component of Z is given by the 
degree map, thus deg((n, m, x)) = m for all n € Z’, m € Z, x € Tor(Z), and A 
lives entirely in Z? x {1} x Tor(Z). By using a group isomorphism to translate A 
in the Z? x Tor(Z) direction if necessary, we may assume that (0, 1,0) € A. 

At present, Z is not a finite group and so we cannot directly apply the Fourier 
analytic techniques from Chapter 4. Thus we shall truncate Z to a finite group (cf. 
the use of Lemma 5.26 to prove Theorem 5.30); an alternative approach (which 
we do not pursue here, due to some minor measure-theoretic and analytic issues 
which arise) is to extend the theory of the Fourier transform and of Chapter 4 to 
infinite additive groups. We choose an extremely large prime number p depend- 
ing on A (much larger than any of the d+ 1 coefficients of elements of A in 
the Z¢+! component of Z), and let Tp: Z —> Zp be the canonical projection 
from Z = Zf x Z x Tor(Z) to the finite additive group Zp := Zs x Tor(Z). If 
p is sufficiently large, then z, is a Freiman isomorphism from A to the addi- 
tive set A, := 7,(A). We endow Z, with the symmetric non-degenerate bilinear 
form 


xE 
A aru 


5.6 Freiman’s theorem in an arbitrary group 241 


forallx,& € Zz and y, n € Tor(Z), where 7 - y is some symmetric non-degenerate 
bilinear form on Tor(Z) (the exact choice of which will be irrelevant). 

Let œ := 1 — TE Now we establish some lower bounds on the spectrum 
Spec,(A, — Ap) of A, — Ap, as defined in Definition 4.34. 














Lemma 5.45 We have |Spec,(Ap — Ap)| > exp(—O(K°™))|Z)|/|A pl. 


Proof We first control the size of sum sets nA for very large n. Since A, is 
Freiman isomorphic to A, we have o[A,] < K. By Proposition 2.26 we can thus 
contain A, inside a translate of a K C_approximate group H of size|H| < K©|A als 
thus 2H C H + X for some X of cardinality O(K 2). Iterating this we see that 
nH C H + (n — 1)X, and thus 
In(A, — A,)| < [2n H| 
< |H||@n — 1)X| 
< goa +2n-— a 
|X| 

< K°|Ap|((X| + 2n — DIXI. 


If we then set n := CK for a sufficiently large constant C, we can ensure that 
1 2—2n 
In(Ap — Ap) < 7” |Ap — Apl. 
We then apply Lemma 4.38 to obtain 
1 
|Spec,(Ap — Ap)| Pz(Ap — Ap) = ae 


and the claim follows (recall |A, — Ap| < K’|A,| from Ruzsa’s triangle 
inequality). 














Now we can use the theory of Freiman homomorphisms and the universal 
ambient group to eliminate the role of the torsion group. Let TI : Z > Z? c Rf 
be the canonical projection from Z = Z? x Z x Tor(Z) to Z4*+!, thus TI(A) is a 
subset of Z^+! and hence of R¢*!. 


Lemma 5.46 We have Spec,(Ap, — Ap) © Zs x {0}. Furthermore, if &' € Zs is 
such that (&', 0) € Spec,(Ap — Ap), then there exists Ẹ € 5 - Z? c R’ with Ë = 
E'/p (mod 1) such that |(x, &)| < t for all x € II(A) — (A). 


Proof From Ruzsa’s triangle inequality we have |A, — A | < K 271A pl- From 
Proposition 4.40 we thus see that A, — A, C Bohrz(Spec,(A, — Ap), x). Thus 
ifE € Spec,(A, — Ap), then |e(§ - x) — 1] < $ forallx € Ap — Ap. In particular 
we can find a phase e?™'? for some 6 € R such that |e(é - x) — e?"!°| < $ for all 


242 5 Inverse sum set theorems 


x € Ap. We can thus find a function x : A, — R such that e(& - x) = e(x(x)) and 
0-4 < x(x) <0+%4 for all x € Ap. It is then easy to see that x : A, > R 
is a Freiman homomorphism, and hence x o x, : A > R is a Freiman homomor- 
phism. Since Z is a universal ambient group for A, we thus see that we can extend 
X © Tp toa group homomorphism (x © 7 )ext : Z —> R. But since R is torsion-free, 
this group homomorphism must annihilate the torsion group Tor(Z). In particu- 
lar, the map ¢ : x > (X © Tp)ext(x) mod 1 is a group homomorphism from Z to 
R/Z which annihilates Tor(Z). On the other hand, the map @: x > £ - p(x) is 
another group homomorphism from Z to R/Z which agrees with ¢@ on A. Since 
Z is a universal ambient group for A, this means that ¢ = @, and thus ¢ must 
also annihilate Tor(Z). In other words we see that £ - x = 0 whenever x € Tor(Z), 
which means that £ € Z$ x {0}, and the first claim follows. 

Now let &’ € z? be such that (&’, 0) € Spec,(A, — Ap). Then as before we can 
find a Freiman homomorphism x : A, — R such that 


(&',0)-x = x(x) mod 1 forall x € A, (5.17) 


and a@ € R such that 
1 1 
0 — g © X0 <8 + gg fral E Ap — Ap, (5.18) 


and we have a group homomorphism (x © 7 )ext : Z —> R which extends x oz 
and annihilates Tor(Z). Since Z = Z4 x Z x Tor(Z), we thus see that there exist 
E e Rf and n € R such that 


(x OW)ext(n, m, x) =n -E + mn for alln € Z!, m € Z, x € Tor(Z). 
Restricting this to elements of A (which lie in Z? x {1} x Tor(Z), we obtain 
x((nmod p,x)) = x(r(n, 1,x))=n- E +7 whenever (n, 1,x) € A. (5.19) 
Applying (5.17) we obtain 
n-&'/p =n-& +n (mod 1) whenever (n, 1, x) € A. 


Since (0, 1,0) € A, we conclude that n = 0 (mod 1). Since A generates all of 
Z = Zf x Z x Tor(Z), we infer that E = é’ /p (mod 1) as desired; in particular 
Ee 1 - Z4, Next, we apply (5.18) to deduce that 


1 5 1 
o= eee gg net EL x) EA 


and thus 


B 1 
l(n—n')-é| < 3 whenever n,n’ € TI(A), 


5.6 Freiman’s theorem in an arbitrary group 243 


and the claim follows (note that the dot product n - x and the inner product (n, x) 
agree when n € Zf and x € R®). 














Since TI(A) — TI(A) is a subset of Zf, it is also a subset of R“. Let B be the 
convex body generated by the open convex hull of TI(A) — II(A); note that B 
is open and non-empty because A generates Z, and hence TI(A) generates Zf. 
Introducing the polar body 


B° := {x € R’ : |x- y| < 1 forall y € B} 
of B, we can rewrite the conclusion of Lemma 5.46 as 
Ee l B° 
5 : 
Combining this with Lemma 5.45, we thus see that 


G B°) n E z”) 5 XPECK Zp] _ exp OKO) p"Tor(Z)| 
5 [Ap [Al 


and thus 











gre 





ee ( A 2") 5 De A, 


Now we take limits as p —> oo. Since B° is open and bounded, the left-hand side 
is just the Riemann sum for mes( B°), and thus 
mes(B°) > exp ( — O(K°?))|Tor(Z)|/|A]. 
Now we use the machinery from Chapter 3. Using the rather crude bound 
mes(B°)mes(B) < O(1)4 = 001)?” (5.20) 


(see Exercise 5.6.1), we can convert this lower bound on B° to an upper bound 
for B: 


mes(B) < exp (O(K°?))|A|/|Tor(Z)]. 


Note that B N Z? contains TI(A) — TI(A); since TI(A) generates Zł, we thus 
conclude that B N Zf linearly spans R. From this and Lemma 3.26 we see that 


|B N Zf] < exp (O(K°))|A|/|Tor(Z)| 


where we have used the earlier observation d = O(K}°™®) to absorb the 34d! /24 
factor from that Lemma. Applying the discrete John theorem (Lemma 3.36) we 
can thus place B inside a progression Q C Zf of rank at most d and volume 


|Q| < exp(O(K°))|A|/|Tor(Z)|, 


244 5 Inverse sum set theorems 


again using the observation d = O(K? ”), this time to absorb the factors of 
(d*“)¢ that will appear. Since A was normalized to contain (0, 1,0), we have 
the inclusions TI(A) € TI(A) — TI(A) © BN Z? C Q, and hence A C T17!(Q). 
But we may write I1~'(Q) = P + G where P is an isomorphic copy of Q, and 
G := Tor(Z). Theorem 5.44 follows. 


Remark 5.47 Itseems of interest to improve the exponential losses exp(O(K ?“)) 
in the above argument. Many of these losses are really exponential in the Freiman 
dimension d rather than in the doubling constant K , so one expects to gain some- 
what when the Freiman dimension is small. However, the main step where the 
exponential losses are largest lies in the proof of Lemma 5.45, where one is forced 
to control extremely large sum sets of A, in order to obtain a lower bound on the 
size of the spectrum. It may be that one will have to use a non-Fourier-analytic 
approach in order to avoid this type of loss. On the other hand, the asymptotic 
behavior of iterated sum sets is certainly relevant to the task of containing A inside 
a convex body or arithmetic progression (see Exercise 5.6.4). However, it may well 
be that this type of argument can at least be pushed to improve exp(O(K ?“)) to 
a factor like exp(O(K log? K)) or perhaps even exp(O(K)). 


We now comment briefly on the slightly different argument of Green and Ruzsa 
[157] in establishing the above theorem. Instead of working in a universal ambient 
group, which could be infinite, they proceed by first using a Freiman isomorphism 
(of order at least 16, say) to embed A inside a very large finite group (similar 
to the group Z, used in the analysis here), and then to use an estimate similar 
to Lemmas 5.45 and 5.46 to reduce the size of this ambient group Z iteratively 
until |Z| < exp(C K“)|A| (the point being that if |Z| > exp(C K©)|A], then the 
arguments of Lemmas 5.45 and 5.46 can be used to locate a narrow Bohr set that 
contains A, which is then Freiman isomorphic to a subset of a smaller group than 
Z. At this point one can apply an extension of Theorem 4.42 (for arbitrary finite 
additive groups, not necessarily cyclic) to show that 2A — 2A contains the sum of 
a large progression and a large group, at which point one can conclude a Ruzsa— 
Chang type theorem for arbitrary groups, which then implies the above theorem 
by an argument similar to how Theorem 5.30 implies Theorem 5.32. In particular, 
they establish 


Theorem 5.48 (Ruzsa—Chang theorem in arbitrary groups) [157] Let A be 
an additive set in an arbitrary additive group Z such that |A + A| < K|A| for 
some K > 1. Then2A — 2A contains a set of the form P + G where P is a proper 
symmetric progression of rank at most CK (1 + log K) and G is a finite subgroup 
of Z such that |P + G| = |P\||G| > e7CKU+og K)) A], 


5.6 Freiman’s theorem in an arbitrary group 245 


Exercises 


5.6.1 


5.6.2 


5.6.3 


5.6.4 


Let B be a symmetric convex body, and consider the Euclidean Fourier 
transform 


POE f Lae x) dé. 


Show that this Fourier transform is large on a large subset of the polar body 
B°, and use this and the Plancherel theorem on R to establish (5.20). 
(A much sharper inequality than (5.6.1) is available, namely Santalo’s 
inequality [306], but we will not need this inequality here.) 

[157] Let A be an additive set with |A + A| < K|A|. Show that there 
exists a finite group Z of order |Z| < exp(O(K? ))|A| such that A is 
Freiman isomorphic of order 2 (say) to a subset of Z. (Hint: combine the 
analysis of this section with Exercise 5.5.8.) 

[154] Suppose p is a prime number, and A is an additive set in Z, 
such that |A + A| < K|A| for some K > 1. Suppose also that |A| < 
exp(—O(K ?"))p for some sufficiently large absolute constant C > 1. 
Show that A is Freiman isomorphic of order 2 to a subset of the integers 
Z. This is known as the Freiman rectification principle; see [29], [154] 
for further discussion. 

Let A be an additive set in Z? which generates 74, and let B be the convex 
hull of A. Show that |n A| = (1 + 0n-+0.(1))n¢?mes(B) as n —> oo. (See 
[261] for more precise results of this type.) 


6 





Graph-theoretic methods 


Additive combinatorics is a subfield of combinatorics, and so it is no surprise that 
graph theory plays an important role in this theory. Graph theory has already made 
an implicit appearance in previous chapters, most notably in the proof of the Balog— 
Szemerédi—Gowers theorem (Theorem 2.29). However there are several further 
ways in which graph theoretical tools can be utilized in additive combinatorics. We 
will only discuss a representative sample of these applications here. First we discuss 
Turdn’s theorem, which shows that sparse graphs contain large independent sets, 
and which is useful for constructing sum-free sets. Next we give a very brief tour 
of Ramsey theory, which allows one to find monochromatic structures in colored 
graphs (or other colored objects), in particular allowing one to find monochromatic 
progressions in any coloring of the integers (van der Waerden’s theorem). Then 
we use some results about connectivity of dense graphs to establish the Balog— 
Szemerédi-Gowers theorem, which relates partial sum sets to complete sum sets 
and which has already been exploited in Chapter 2. Finally, we use the theory of 
commutative directed graphs to establish the Pliinnecke inequalities, which are 
perhaps the sharpest inequalities known for sum sets and which strengthen several 
of the results already established in Chapter 2. 

In Chapter 10 and Chapter 11 we shall discuss one final graph-theoretical tool, 
the Szemerédi regularity lemma, which has had many applications in several areas 
of discrete mathematics, but which in additive combinatorics has had an especially 
crucial role in the study of arithmetic progressions in dense sets. 

Graph-theoretic tools are especially useful when combined with the probabilis- 
tic method, which we already saw in Chapter 1, and indeed many of our arguments 
here will be probabilistic in nature. 


246 


6.1 Basic Notions 247 


6.1 Basic Notions 


A graph G = G(V, E) consists of a finite set V of vertices (points, nodes) and 
a finite set E of edges, where each edge is an unordered pair {a, b} of distinct 
vertices (thus we do not allow loops). 

If {a, b} € E, we say that the two vertices a and b are adjacent or neighbors. 
The collection of all the neighbors of a shall be denoted N (a). The cardinality of 
N(a) is called the degree of a and is denoted deg(a). 

Consider a subset V’ of V. We refer to the graph G’ = G’(V’, E’) where E’ := 
{e € E:e C V’} as the induced subgraph of G which is spanned by V’. A set 
V’ C V is independent if it spans an empty graph, i.e., there is no edge with both 
endpoints in V’. 

We say that the vertices ao, ... , ax forma path of length k if {a;, a;+,}is an edge 
for all 0 <i < k — 1. If a = ao, we refer to the path as a cycle. Three vertices 
a, b, c form a triangle if they form a cycle of length 3, i.e. {a, b}, {b, c} and {c, a} 
are edges. 

A graph is bipartite if one can partition its vertex set into two disjoint sets A and 
B so that every edge has one end point in A and another in B; A and B are called 
the color classes of G. Bipartite graphs play an important role in what follows 
and when dealing with them, we prefer to use the notation G(A, B, E) instead 
of G(V, E). Note that in a bipartite graph G = G(A, B, E), two vertices in the 
same color class can only be connected by paths of even length, while vertices in 
opposite color classes can only be connected by paths of odd length. In particular 
all cycles must be of even length. 


Exercises 


6.1.1 Prove that a graph G is bipartite if and only if all cycles are of even length. 

6.1.2 Let A be asymmetruc additive set (so A = — A) ina finite additive group 
Z. The Cayley graph of A is defined to be the graph with vertex set Z, 
and two vertices x, y connected by an edge if and only if x — y € A. 
Show that deg(v) = |A] for all v € Z, and that two points v, w € Z are 
connected by a path of length n if and only if v — w € n(A). Show that 
G is connected if and only if A spans Z. 

6.1.3 (Popularity principle for bipartite graphs) Let G(V,, V2, E) be a bipar- 
tite graph with V} non-empty. Show that there exists a bipartite sub- 
graph G'(V, V5, E’) of G(V1, V2, E) with |E"| > |E|/2 and degg (v2) > 
|E|/2|V2| for all v2 € V3. 

6.1.4 (Cauchy—Schwarz for bipartite graphs) Let G(V;, V2, E) be a bipartite 
graph with Vi, Vz non-empty. Show that G contains at least | E|?/|V2| 


248 6 Graph-theoretic methods 


paths of length two with both endpoints in V, including degenerate paths. 
Show, G also contains at least | E|*/|V;||V2| cycles of length four. 

6.1.5 [198] Let G(V,, V2, E) be a bipartite graph with V1, V2 non-empty. Show, 
for any k > 1, that G contains at least | E|**/|V,|‘~!|V2|* paths of length 
2k with both endpoints in V;, including degenerate paths, and also that G 
contains at least |E|**+!/|V,|*|V2|* paths of length 2k + 1 from V; to V2. 
(Hint: using the popularity principle, one can obtain lower bounds like 
this but losing an absolute constant depending on k. Then use the tensor 
power trick (as in Corollary 2.19) to remove this constant.) 

6.1.6 Let G = G(V, E) bea graph. Using the first moment method, show that 
G contains a bipartite subgraph G’(A, B, E’) with | E’| > HJE]. Give an 
example to show that the number 5 cannot be replaced by any larger 
constant. 


6.2 Independent sets, sum-free subsets, and Sidon sets 


Intuitively one expects graphs with small degrees to have large independent sets. 
The following theorem, due to Turan, quantifies this intuition. 


Theorem 6.1 (Turan’s theorem) Let G = G(V, E) be a graph on n vertices. 
Then G contains an independent set of size at least X „<y aye In particular, if 


G has maximal degree d, then G has an independent set of size at least n/(d + 1). 


Proof We shall use the probablistic method, or more precisely the first moment 
method. Let x : V — [1, n] be a bijection chosen uniformly at random. Let us 
call a vertex v € V good if it is larger than all its neighbors, in the sense that 
z(w) < 2(v) whenever w € N(v), and let S be the set of all good vertices. It is 
clear that S is an independent set. Also, for any v € V, the probability that v is good 


can be easily verified to be Tae Thus by linearity of expectation (1.4) we have 
1 
ESD = 2 PweS)=) 7——__ 
È 2 deg(v) + 1 














and so |S] > $ with positive probability. The claim follows. 


1 
veV deg(v)+1 


6.2.1 Sum-free subsets 


In 1965, Erdős and Moser [86] (see also [166], Problem C14) posed the following 
question. If B C A are two additive sets, let us say that B is sum-free with respect 
to A if no element of A can be represented as the sum of two distinct elements of 
B. Given any additive set A, let f(A) be the cardinality of the largest subset of A 


6.2 Independent sets, sum-free subsets, and Sidon sets 249 


which is sum-free with respect to A. Let ø (n) be the smallest value of 6(A) among 
all sets A of size n; thus (n) is the largest number such that every set A of n reals 
contains a subset of cardinality (n) which is sum-free with respect to A. 

Note that it is important that we require the elements of B be distinct in order 
for this problem to be interesting. To see this, consider the set A := 2^[1, n] = 
{2, De ones 2”}. Clearly, if B is any subset of A of two or more elements, then there 
exists an element of A which is the sum of two (equal) elements in B. 

It was remarked by Klarner (unpublished) and mentioned by Erdős in [86] that 
(n) = Q(logn) for large n. The first published proof of this bound appeared in 
Choi’s paper [55] about ten years later: 


Theorem 6.2 Let n be a large integer. Any set A of n real numbers contains a 
subset B of cardinality logn — O(1) which is sum-free with respect to A. In other 
words, ġ(n) > logn — O(1). 


Proof Let us first prove the claim for sets A of positive reals. Let us order the 
elements of A as aj > a2 > --- > an > 0. Consider the graph G with vertices A, 
with two distinct elements a, b € A connected by an edge if and only ifa +b € A. 
By Theorem 6.1, this graph contains an independent vertex set B of size 


n 1 
B> X —. 
A 2, deg(a;) + 1 


Since B is independent in G, we see that B is sum-free with respect to A. Also, 
since a; + a; > a;, and there are only n — i gemon of A larger than a;, we 
see that deg(a;) < n — i for all i. Since yor = logn — O(1), the claim 
follows. 

To prove the general case, observe from the pigeonhole principle that any set of 
n reals either contains a subset of /2 — O(1) positive reals orn/2 — O(1) negative 
reals, and the claim then follows (for large n) from the preceding paragraph. 


reese 














Let us now discuss the upper bound. Thus, we are interested in constructing sets 
A which do not contain large sum-free subsets. Erdős and Moser [86] proved that 
o(n) < n/3 and suggested that it probably has order o(n). The first improvement 
over the Erdős and Moser result was due to Selfridge, who showed ¢(n) < n/4. 
Choi [55], using sieve methods, proved that (n) < Oe (n?/ aS) for all € > 0. He 
also noted that in this problem it suffices to consider the special case when A is a 
set of positive integers. Choi’s result was slightly improved by Baltz, Schoen and 
Srivastav [17], who showed that (n) < O(n?” log” 3 n). A significant improve- 
ment of the upper bound was very recently obtained by Ruzsa [297] who proved 
that 


p(n) = e2 Vogn). 


250 6 Graph-theoretic methods 


In the following we describe Ruzsa’s construction, which, besides being very 
clever, is short and instructive. A key trick is to use a Freiman isomorphism to 
embed the problem in a very large-dimensional space (see also Exercise 10.1.4). 

We shall need a dimension d = @(./logn). Using a Freiman isomorphism 
(see Lemma 5.25) it is enough to construct a set A C Z such that |A| > n and 
P(A) < e?e., For any r > 0, let D, C Zf be the set of integral lattice points 
in the ball of radius r centered at the origin, thus 


d 
D, = ananez! Sater}, 


i=l 


We then set 
r-1 , 
A =|] -Dri 
i=0 


where r = e°(vioen), For an appropriate choice of d and r one can make |A| > n 
and we claim that 


(A) < iy = e O(log) 


Indeed, let S C A have cardinality greater than 24r. Then by the pigeonhole 
principle there exists 0 <i < r such that |S M(2'- D,_;)| > 2f. Since |D,| = 
2d +1 < 24, we see that i <r — 1. By the pigeonhole principle again, we 
can then find two vectors s’, s” € SM (2! - D,_;) which are congruent modulo 
2- Zf (i.e. they have the same parity in each coordinate). Then one easily ver- 
ifies that s’ +s” € 2'+!. D,_;_, C A, and so S is not sum-free with respect 
to A. 


Remark 6.3 We return to the lower bound. In the same paper which established 
the upper bound, Ruzsa [297] improved Choi’s result slightly by showing ¢(n) > 
2 log; n — 1. Given the fact that Ruzsa’s upper bound is sub-polynomial, one may 
suspect that (n) = O(log n), i.e., the right order of magnitude of (n) is logn. 
It is, however, not the case. In a recent paper, Sudakov, Szemerédi and Vu [340] 
proved that #(n) is super-logarithmic: thus in Landau notation 


(n) = a(n) logn. 


While this result improves Choi’s result only slightly, its proof requires 
heavy machinery that involves the Balog—Szemerédi—Gowers theorem, Freiman’s 
theorem, and Szemerédi’s theorem. In this paper [340], the authors also 
proved a hypergraph version of the Balog—Szemerédi—Gowers theorem (see 
Section 6.4). 


6.2 Independent sets, sum-free subsets, and Sidon sets 251 


6.2.2 Turan’s theorem and triangle-free graphs 


Let G be a graph of n vertices and maximum degree d. The lower bound of 
n/(d + 1) for the size of an independent set that is given by Theorem 6.1 cannot 
be improved for general graphs G. Thus it was a stunning discovery when Ajtai, 
Komlós and Szemerédi [2] discovered that one can improve this bound by a factor 
Q(log d), provided that the graph is triangle-free (i.e. it contains no cycles of length 
three): 


Theorem 6.4 [2] Let G = G(V, E) be a triangle-free graph on n vertices with 
maximum degree d > 1. Then G contains an independent set of size Q(4 log d). 


Proof The original proof of Ajtai, Komlós and Szemerédi is one of the most 
important proofs in probabilistic combinatorics, as it inspired the development 
of the so-called semi-random method, which is one of the key achievements in 
discrete mathematics in the last twenty five years (see for example the introduction 
of [204]). That proof, however, is complicated and we choose to present a simpler 
one, found later by Shearer [316]. Shearer’s proof also gives the specific lower 
bound Duaa 

Let X be the set of all independent sets in G; X is clearly non-empty. Let J be 
an element of X chosen uniformly at random; thus J is an independent set of G. 
It suffices to show that E(|7|) > ebeia, 

For each vertex v € V define a random variable 


Y, := dI A wH +INW)NA T|, 


where we recall N (v) is the set of neighbors of v. Since J is independent, we see 
that Y, = d when v € J, and Y, = |{w € I : {v, w} € E}| otherwise. Since each 
vertex w in J can be in the neighborhoods of at most d other vertices, a simple 
counting argument then yields that 


XOY, < 2dļll. 


veV 


Taking expectations of both sides and using linearity of expectation, we conclude 
that 


E(\I|) > 57 EW. 


veV 
Thus to prove the desired lower bound on E(|/|), it will suffice to show that 
log, d 
EY.) = BT 


forallve V. 


252 6 Graph-theoretic methods 


Fix v € V, and consider the induced subgraph G’(V’, E’) spanned by V’ = 
V\{N(v) U {v}. The set Z N V’ is an independent set in V’. To prove the lower 
bound on E(Y,) it will suffice to establish the conditional expectation bound 


l d 
EY, INV =)> apes 


for all independent sets 7’ in V’. 

Fix this independent set 7’. Let J C N(v) be the set of vertices in V that are 
adjacent to v but not adjacent to any vertex in J’. Now we make a critical use 
of the triangle-free hypothesis. Since G is triangle-free, J is an independent set. 
Therefore, once I’ is fixed, we can construct J by either adding v or adding a 
subset of J to I’. If |J| = m, then J has 2” subsets and so there are exactly 2” + 1 
choices for J. If v is added to I’, v € I so Y, = d. If a subset J’ of J is added to 
I’, then Y, equals the cardinality of J’. Since the average cardinality of J’ is m/2, 
and all choices of J are equally likely by construction, we obtain 

d m 2” 


E(Y,|2AW=l)= F 
(YI ) myi 2241 


A routine calculation shows that for any integers d > 16 and m > 1, 
d Š m 2 Š log, d l 
2m4+41 227417 4 

















concluding the proof. 





Remark 6.5 Ajtai et al. conjectured that the bound Q(4 log d) can be sharpened 
to (1 + On>oo;a(1))} log d; this has been confirmed by Shearer [317]. 


6.2.3 Sidon sets 


Recall that an additive set S is called a Sidon set (also known as Sidon sequence) if 
the pairwise sums are all different (except for the trivial equalities a + b = b + a). 
This notion was introduced by Sidon [319] in 1932, motivated by problems in 
functional analysis. 

It is well known, from the work of Erdős & Turan [100] and Singer [320] that 
the maximum cardinality of a finite Sidon sequence of integers contained in [1, n] 
is asymptotically ./n; see Exercises 2.2.6 and 2.2.7. In [320] Singer showed that 
for n = p? + p +1, where p is a prime, there is a Sidon set consisting of p + 1 
integers between 1 and n. Because of this property, any two translates of this set 
modulo p? + p + 1 intersect in exactly one residue class. Thus, the collection of 
all p? + p + 1 translates can be identified with the set of lines of a projective plane 
PF? of order p. 


6.2 Independent sets, sum-free subsets, and Sidon sets 253 


Estimates concerning infinite Sidon sequences are less satisfying. Erdés & 
Turan and Stöhr [339] proved that if S is a Sidon sequence, then 


ISAT all 


lim sup 0. 


n—- oo 
Using a greedy algorithm, it is easy to show that there is an infinite Sidon sequence 
S such that |S N [1, n]| = Q(n"/?) (see Exercise 6.2.5). 
It is quite hard to improve upon this trivial bound. The first break-through was 
due to Ajtai, Komlós and Szemerédi: 


Theorem 6.6 [2] There is an infinite Sidon sequence S C Z* such that |S A 
[1, n]| = Qn! log? n) for all sufficiently large n. 


The proof of this theorem used Theorem 6.4. In fact, this theorem was first 
developed as a lemma for the proof of Theorem 6.6. Recently, Ruzsa [298] has 
significantly improved the above result by constructing a Sidon sequence where 
ISA [1, n]| > nV?“ = Q(n4"2), using a different method. 


Remark 6.7 One can generalize the definition of Sidon sequences by considering 
sequences where the sums of any two h-tuples are different. Such sequences are 
called B; sets and have been studied by various authors. For instance, in [53, 192], 
it was shown that if h is even, then a B, set consisting of integers contained in 
[1, n] cannot have more than (h/2)!/"((h/2)!)*/"'n!/" + O(n'/2") elements. These 
papers also study B, sets modulo a prime. 


Remark 6.8 Let S be a subset of [1, n]. We say that S a maximal Sidon set (with 
respect to [1, n]) if S is Sidon and is maximal with respect to inclusion (i.e., adding 
any element from [n]\S to S would destroy the Sidon property). It is reasonable 
to ask what is the minimum size of a maximal Sidon set. It is easy to prove that 
1/3 elements. Ruzsa [299], using 
Singer’s construction [320], showed that there is a maximal Sidon set with at most 
cn!” log!’ 3 


any maximal Sidon set should have at least n 


n elements. 


Exercises 


6.2.1 Without using Theorem 6.1, give an elementary proof of the fact that a 
graph G on n vertices with maximum degree d must contain an inde- 
pendent set of size n/(d + 1). (Hint: use the greedy algorithm.) Give 
examples that show that this n/(d + 1) bound cannot be improved. 

6.2.2 Let G = G(V, E) be a graph. Show that G contains an independent set 
of size at least ott). 

6.2.3 Generalize Theorem 6.2 to the case to the case when A takes values 
in an arbitrary torsion-free additive group. (The torsion-free condition 


254 6 Graph-theoretic methods 


is absolutely necessary, as can be seen by considering the case when 
A= Zy). 

6.2.4  LetS C [1, n]beamaximal Sidon setin [1, n]. Show that2S — S contains 
[1, n], and conclude that |S] = Q(n"). 

6.2.5 [339] Let S = {1, 2,4, 8, 13, 21, 31, ...} be the Sidon set of positive inte- 
gers constructed by the greedy algorithm (this set is sometimes known 
as the Mian—Chowla sequence). Show that the kth element of S does not 
exceed (k — 1)? + 1, and hence |S N [1, n]| = Qn!) as n > on. 

6.2.6 | (Minkowski’s bound for sphere packing) A sphere packing P in R” is 
a collection of non-intersecting open spheres with equal radii, and its 
density A(P) is the fraction of space covered by their interior. Define A, 
to be the supremum of A(P) taken over all packings in R”. Prove that 
Ay = &2(2~"). (This is a special case of the Hlawka—Minkowski problem 
of packing convex sets in R”.) 

6.2.7 [218] Let the notation be as in the previous exercise. Prove that A, = 
Q(n2~"). (Hint: Discretize the problem, convert the sphere packing prob- 
lem to one of finding a large independent set, and apply Ajtai et al.’s 
theorem.) Up to a constant this is the best bound known for sphere 
packing. 

6.2.8 Prove the following extension of Theorem 6.4. Let G = G(V, E) be a 
triangle-free graph on n vertices with maximum degree d and T triangles. 
Then G contains an independent set of size 2G log aan). (Hint: 
Apply Theorem 6.4 to a properly defined random subgraph of G.) 


6.3 Ramsey theory 


We now briefly consider another application of graph theory, or more precisely 
Ramsey theory, to additive combinatorics. This theory typically can produce results 
of the following form: if an explicit set (such as [1, NJ) is colored into finitely many 
colors, then at least one of the color classes contains a specific arithmetic structure 
(e.g. an arithmetic progression). The simplest example of this is the pigeonhole 
principle: if we color an n-element set by fewer than n colors, then there exists 
two elements with the same color. Indeed one can view Ramsey theory as the 
study of generalizations and repeated applications of the pigeonhole principle. 
We will focus on only two results in this field, namely Schur’s theorem and the 
Hales—Jewett theorem (a generalization of van der Waerden’s theorem); for a more 
thorough treatment of these topics, see [143]. 

We say that a graph G is complete if every pair of distinct vertices v, w € G is 
connected by exactly one edge. A edge k-coloring of a graph G(V, E) is a partition 


6.3 Ramsey theory 255 


of the edge set E into k classes E,,..., Ex. We say that a subgraph G” of G is 
E ,;-monochromatic if all of its edges lie in £}. 


Theorem 6.9 (Ramsey’s theorem for two colors) [276] Let n,m > 1 be inte- 
gers, and let G = (V, E) be a complete graph with at least ("*"" 2) vertices. Then 
for any edge 2-coloring E = Eptue U Erea, there either exists a blue-monochromatic 
complete subgraph Gptue with n vertices, or a red-monochromatic complete sub- 
graph Grea with m vertices. 


Example 6.10 Any two-coloring of a complete graph with six or more vertices 
into red and blue edges will contain either a blue triangle or a red triangle. 


Proof We shall induce on the quantity n + m. When n + m = 2 (i.e.n =m = 1) 
the claim is vacuously true. Now suppose that n + m > 2 and the claim has already 
been proven for all smaller values of n + m. If n = 1 then the claim is again 
vacuoust (with R(1, m) = 1), and similarly when m = 1. Thus we shall assume 
n,m > 2. 

Let G = (V, E) be a complete graph with at least ("t”7 2) vertices, and let 
v € V be an arbitrary vertex. This vertex is adjacent to at least 


n+m—2 n+m—3 n+m—3 
-l= + — 1 
n—1 n—2 n—1 


many edges, each of which is either blue or red. Thus by the pigeonhole princi- 
ple, either v is adjacent to at least (Cate 3) blue edges, or is adjacent to at least 


(mr 3) red edges. Suppose first that we are in the former case. Then we can find 


a complete subgraph G’ of G with at least es 3) edges such that every vertex 
of G’ is connected to v by a blue edge. By the induction hypothesis (with (n, m) 
replaced by (n — 1, m)), G’ either contains a blue-monochromatic complete sub- 
graph Gie With n — 1 vertices, or a red-monochromatic complete subgraph G! a 
with m vertices. In the latter case we are already done by taking Grea := Grea, and 
in the latter case we can find a blue-monochromatic complete subgraph Gptue of 
G with n vertices by adjoining v to Gj), (and adding in all the edges connecting 
v and Gue, Which are all blue by construction). This disposes of the case when 
v is adjacent to at least ("*""> 3) blue edges; the case when v is connected to at 
least EER 3) red edges is proven similarly (now using the inductive hypothesis at 


(n,m — 1) instead of (n — 1, m)). 














Remark 6.11 The bound (A is sharp for very small values of n and m, but 
can be improved for larger values of n and m, although computing the precise 
constants is very difficult (for instance, when n = m = 5 the best constant is only 
known to be somewhere between 43 and 49 inclusive). On the other hand, lower 
bounds are known (see Exercise 6.3.6). 


256 6 Graph-theoretic methods 


One can iterate this theorem to arbitrary number of colors: 


Corollary 6.12 (Ramsey’s theorem for many colors) [276] Given any positive 
integers Ni, ..., Nm, there exists a number R(n,,...,Mm3;m) such that given 
any complete graph G = (V, E) with at least R(n,,...,m3;m) vertices, and 
any edge m-coloring E = E U---U En, there exists al < j <m anda Ej- 
monochromatic complete subgraph G ; of G with nj vertices. 


Proof We induce on m. The case m = | is trivial, and the case m = 2 is just 
Theorem 6.9. Now suppose inductively that m > 2 and the claim has already been 
proven for all smaller values of m. We set 


Ri, -<-s Am; m) := R(R(n;, <.. Nm—1;M — 1), nm; 2). 


Suppose we color the edges of K rin,,...,n,;m) into m color classes E1, ..., Em. We 
coarsen this edge m-coloring into an edge 2-coloring E1 U---U Em-1, Em. By 


gases 


the induction hypothesis, we see that with respect to the coarsened coloring, either 
G contains a E,,-monochromatic complete subgraph Gm with nm elements, or 
m—1 with 
R(ny,...,%m—1;m — 1) elements. In the first case we are done; in the second case 


G contains a E1 U---U E,,_;-monochromatic complete subgraph G1 


ees 


we are done by applying the induction hypothesis once again, this time to the 
complete graph G1,...m-1. This complete the induction and than the proof. 














gis es 


We now give an immediate application of Ramsey’s theorem to an arithmetic 
setting. 


Theorem 6.13 (Schur’s theorem) [3/5] Ifm, k are positive integers, there exists 
a positive integer N = N(m,k) such that, given any partition of [1, N] into m 
sets [1, N] = A; U - -- U Am, at least one of the Aj contains a subset of the form 
{x1, -3 Xk, X1 +--+ + xg}. Infactwecanchoose N := R(k + 1,...,k + 1;m)— 
1, using the notation of Corollary 6.12. 


Remarks 6.14 Schur’s theorem (in the k = 2 case) is equivalent to the assertion 
that the set [1, N] cannot be covered by m sum-free sets if N is sufficiently large 
depending on m; in particular, the integers cannot be partitioned into any finite 
number of sum-free sets. Even when k = 2, the value of N given by the above 
arguments grows double-exponentially in m (Exercise 6.3.4); this is not best pos- 
sible. For instance, it is known that given any 2-coloring of [1, N], there exist at 
least EN 2 ZN monochromatic triples of the form (x, y, x + y), and that this 
bound is sharp [280], [313] (see also [142]). 


Proof Let G = G(V, E) be the complete graph on the N + 1 vertices V := 
[1, N + 1], and let us edge m-color this graph as E = E; U---U Em where Ej; is 
the set of those edges (a, b) for which |a — b| € Aj. By Corollary 6.12, the graph G 


6.3 Ramsey theory 257 


must contain a complete subgraph G” of k + 1 vertices which is E, monochromatic 
for some r. If we list the vertices of G’ in order as vg < vı < --+ < vg, then the 
quantities c(v; — vj) fori > j are all equal to each other. The claim then follows 
by setting x; := vj — vj-1 E€ A,. 














We now give the Hales—Jewett theorem, which we state in an “arithmetic” 
format. While not strictly a theorem about graphs, it is certainly close in spirit to 
Ramsey’s theorem. 


Theorem 6.15 (Hales—Jewett theorem) /169] Letm > 1 andn > 1. Then there 
exists an integer d = d(|A|, m) > 1 such that if [0,n — 1]¢ c Zf is partitioned 
into m non-empty sets [0, n — 1]¢ = E, U---U Em, thenat least one of the sets Ej 
contains a proper arithmetic progression a + [0, n — 1] - v of length n, for some 
a €[0,n—1]¢ and v € (0, 1]. 


This theorem can be proven by a double induction. It is a special case of the fol- 
lowing more technical proposition, in which one either locates a single monochro- 
matic progression of length n, or several linked monochromatic progressions of 
length n — 1 (with each progression being monochromatic with a different color). 


Proposition 6.16 Letm > 1,n > 1, and 1 < s < m. Then there exists an integer 
d = d(n,m, s) > 1 such that if [0, n — 1]? c Zf is partitioned into m non-empty 
sets [0, n — 1] = E U- --U Em, then either at least one of the sets E; con- 
tains a proper arithmetic progression a + [0,n — 1] - v, or there exists distinct 
classes Ej,,..., Ej, anda € [0,n — 1] and v,..., Vs € [0, 14 such that a + 
[l,n—1]-v; C E; forall <i <s. 


Indeed, applying Proposition 6.16 with s := m one can conclude Theorem 6.15, 
since if one has m distinct monochromatic progressions a + [1, n — 1] - v;, then 
one of the progressions a + [0, n — 1]- v; must also be monochromatic by the 
pigeonhole principle. 


Proof of Proposition 6.16 To abbreviate notation, we shall use “arithmetic pro- 
gression” in this proof to denote any proper arithmetic progressiona + [0, n — 1] - v 
ora + [1,n — 1] - v in a lattice Z? where a € [0, n — 1]? and v € [0, 1]. 

We use two induction loops. For the outer loop, we induce on n. The claim 
is trivial when n = 1, so we assume that n > 1 and the claim has already been 
proven for n — 1 (and for arbitrary m, s). In particular, by the above discussion we 
see that we may assume Theorem 6.15 for n — 1. 

Now we begin our inner loop, inducing on s. When s = 1 the claim follows from 
Theorem 6.15 for n — 1 (shifting [1, n — 1] to [0, n — 2]), so assume that 2 < s < 
m and the claim has already been proven for s — 1 (and the same value of n, but 
with arbitrary m). We set d := d(n, m, s) := dı + dz, where dı := d(n, m, s — 1) 


258 6 Graph-theoretic methods 


and dz := d(n — 1, m°n*“). Let [0, n — 1]f = E U--- U Em be a partition of 
[0, n — 1]f into m distinct color classes. Suppose that none of the E; contain 
any arithmetic progressions of length n. Our task is then to show that there are 
s distinct classes E;,,..., Ej, a € [0,n — 114, and vj,..., vs € [0, 1] such that 
a+[l,n-— 1]; v; C E, forall <i <s. 

We write [0,n — 1] = [0,n — 1]* x [0,n — 1], and for each x€ 
[0, n — 1] we consider the partition [0, n — 1]ļ* = Eix U+ U Emx, where 
E; x := {y € [0, 1]Ċ : (y, x) € Ej}. Since none of the E; contain an arithmetic 
progression of length n, neither do the E,,. By definition of dı and the inner 
induction hypothesis, we conclude that for each x there exist distinct color classes 


Jiz» ++ +> Js—tx» Ay € [0, n — 1“ and v1.7, ..., Us—1,x E [0, 1]% such that 
a, +[l,n — 1]; vix € Eju (6.1) 


for all 1 <i < s — 1. Note that a, itself must then belong to another color class 
Js,x distinct from j1,,,..., js—1,x, otherwise one of the classes E; „ would contain 
an arithmetic progression of length n. If we set vs,» := 0 then we see that (6.1) now 
holds for i = s also, although in that case the progression a, + [1, n — 1]- vj, is 
not proper. This will however be rectified by means of the dz coordinates. 

The map x +> (ji.x,---5 Js,x, x, Vix, +++, Us—1,x) is a map from [0, n — 1]@ to 
a set of cardinality at most mé n°“, Thus it induces a partition [0, n — 1] = F; U 
--- U Fpspsa into mnt% color classes (some of which may be empty). By definition 
of d and the outer induction hypothesis (again shifting [1, n — 1] to [0, n — 2]), we 
conclude that one of the color classes F, contains an arithmetic progression a, + 
[1,n —1]- v, with a, € [0, n — 1]? and v € [0, 1]”. This means that there exist 
distinct ji), <- -, js € F1, m], ag € [0, n — 1], and v1.4), ..-, Vsa) € [0, 1] 
(with vs = 0) such that ag) + [1,2 — 1] - vie € Ej, for all x ea, + [1, n — 
1]-v, and 1 <i < s. But if we now set a := (aq), ax) € [0, n — 1]f and v; := 
(via), Vx) € [0, 14, we see that a + [l,n]-v; € E; for all 1 < j < s, and that 
each of the a + [1, n] - v; are proper arithmetic progressions of length n — 1. This 
closes the induction loop, and the claim follows. 














This theorem has a number of consequences, the most notable being perhaps 
van der Waerden’s theorem. 


Theorem 6.17 (van der Waerden, [371]) Let k,m > 1 be integers. Then there 
exists an integer N = N(k,m) > 1 such that given any proper arithmetic progres- 
sion P of length at least N (in an arbitrary additive group Z), and any partition 
P=E,U---UE,, of P into m color classes, at least one of these classes E; 
contains a monochromatic proper arithmetic sub-progression P' of P of length 
|P'| =k. 


6.3 Ramsey theory 259 


We leave the proof as an exercise. Let us, however, remark that if we fix k then 
the bound on m which follow from Hales—Jewett’s theorem are very poor, being 
of growing as fast as the infamous Ackermann function. One can use Gowers’ 
theorem [138] and the pigeonhole principle to deduce a much better bound. 


Remark 6.18 In the case of k = 3, Solymosi observed (private communication) 
that one can obtain a rather good bound (which is comparable to the bound one 
gets from Roth’s theorem) by a simple argument which does not involve Fourier 
analysis. For simplicity, let us assume that we color a group Z of cardinality N by 
k colors. We now show that there is a monochromatic arithmetic progression of 
length 3, assuming that k is sufficiently small compared with N. Let C; be the most 
popular color and let a), ..., am, be the elements colored by C4. Clearly mı > n/k. 
By the pigeonhole principle, there is an element x € Z such that there are at least 
(1)/ n pairs (a;,a;),i < j such that a; — a; = x. If there is no monochromatic 
arithmetic progression of length 3, then b; = a; + x is not colored by C;. Thus 
we end up with a set Sı of at least 


()/" > n/3k* =n 


elements which are not colored by C;. Now repeat the argument with the set S1; 
we end up with a set S of size at least (3) /n > n/27k* = m elements which are 
not colored by either Cı or Cy (Exercise 6.3.8). Iterating this argument k times, 
we end up with a set of ng = n/ 3%-1ķ* elements which cannot be colored by any 
color. This is a contradiction if n > cee oan 


Exercises 


6.3.1 Using Schur’s theorem, show that if the positive integers Z* are finitely 
colored and k > 1 is arbitrary, then there exist infinitely many monochro- 
matic sets in Z* of the form {x,,..., Xk, X1 +--+: + xg}. (Hint: Schur’s 
theorem can easily produce one such set; now color all the elements of 
that set by new colors and repeat.) Conversely, show that if the previous 
claim is true, then it implies Schur’s theorem. 

6.3.2 Show that if the positive integers Z* are finitely colored then there 
exist infinitely many distinct integers x and y such that {x, y, x + y} 
are monochromatic. (Hint: refine the coloring so that x and 2x always 
have different colors.) A more challenging problem is to establish a simi- 
lar result for general k, i.e. to find infinitely many distinct x,,..., x, such 
that {x),..., Xg, X1 +-+- + xg} is monochromatic. 

6.3.3. Show that if the positive integers Z* are finitely colored and k > 1 is 
arbitrary, then there exist infinitely many monochromatic sets of the form 
{X1,..-, Xk, X1 - - - Xg}. Thus Schur’s theorem can be adapted to products 


260 


6.3.10 


6.3.11 


6 Graph-theoretic methods 


instead of sums. However, nothing is known about the situation when 
one has both sums and products; for instance, it is not even known that 
if one finitely colors the positive integers that one can find even a single 
monochromatic set of the form {x + y, xy} for some positive integers 
x, y (not both equal to 1). 

Show that the quantity N(m, k) in Schur’s theorem can be taken to be 
ody". 

Let k be an integer, and let A be an additive set in an ambient group Z such 
that |A| > (ak and let C be an arbitrary subset of Z. Show that there 
exists a set B C A of cardinality |B| = k such that either B + B C C or 
B + B is disjoint from C. 

[84] Show that if > 3 and N < 2”/? then there exists a two-coloring of 
the edges of the complete graph on N vertices which does not contain a 
monochromatic complete subgraph of n vertices. (Hint: color the graph 
randomly.) 

Prove van der Waerden’s theorem. (Hint: set N = k! fora large d, and 
identify P with [0, k — 1]¢. Then apply Theorem 6.15.) 

Consider Remark 6.18. Show that if after the ith step we get an element y 
which is colored by C; for some j < i, then y, y — (di +---+ dj), y — 
2(d; +--+ + dj) are all of color C;, where d is the “popular” difference 
in step L. 

Let Z be an arbitrary finite additive group, partitioned into m color classes 
E1 U---U Em. Show that for any k > 1 there exists a color class E; such 
that 


Parez(a, a +r,...,a +(k— lr € Ej) = Qk mC). 


(Hint: apply Theorem 6.17 to a random progression in Z of a suitable 
length N (k, m) and use the first moment method.) This is a weak form of 
Varnavides’ version of Szemerédi’s theorem, see Theorem 11.1. 

Let A be an additive set, and let P(n) be a statement pertaining to an 
element n € A. Let us say that the property P is k-choosable for some 
k > 1 if, given every proper arithmetic progression of length k in A, at 
least one element n of that progression obeys the property P(n). Show 
that if the properties P,(n),..., Pm(n) are k-choosable, then the joint 
property P\(n) ^ +-+ A P,,(1) is Ok, m(1)-choosable. (This statement is in 
fact equivalent to van der Waerden’s theorem, and plays a key role in the 
original proof [345] of Szemerédi’s theorem.) 

(Multi-dimensional Hales—Jewett theorem) [169] Let n, m, r > 1. Show 
that there exists an integer d = d(n, m,r) > 1 such that, given any par- 
tition of [0, n — 1] into m color classes E},..., Em, then at least one of 


6.4 Proof of the Balog—Szemerédi-Gowers theorem 261 


the color classes contains a proper generalized arithmetic progression 
a+ [0,n—1]"-(v,...,v;), where a € [0,n—1]@ and v,...,v, € 
[0, 1]¢. (Hint: apply Theorem 6.15 with n replaced by n”.) 

6.3.12 (Gallais theorem) Let k > 1, d > 1, m > 1, and let vj,..., vg be ele- 
ments of Z?. Show that there exists an N = N(k,d,m, v1,..., Ug) 
such that for partition of the cube [1, N J] c Z into m color classes 
E\,..., Em, then at least one of the color classes contains a set of the 
form {x + rv], ..., x + rvg} for some x € Z and some non-zero integer 
r. 


6.4 Proof of the Balog—Szemerédi—Gowers theorem 


Let A and B be two additive sets with common ambient group. Let G = 
G(A, B, E) be a bipartite graph whose color classes are A and B and whose edge 
set is E (an edge is a pair (a, b) where a € A and b € B). Recall that the partial 


G 
sum set A + B is defined as the collection of the sums a + b wherea € A,b € B 
and (a, b) € E. 
Balog and Szemerédi [16] proved that if A and B are two sets of cardinality N 


and |E| > n?/K and |A $ B| < K'n for some K, K’, then one can find A’ C A 
and B’ C B such that |A'|, |B’|, |A’ + B’| = Ox,x' (n). 

As stated, the above theorem is only useful if K and K’ are independent of n 
(or extremely slowly growing in n). With a new proof, Gowers [138] has recently 
strengthened this statement by showing that the implicit constants in the Ox, x0) 
notation can be taken to be polynomial in K and K’, and hence the theorem 
remains effective even when K and K’ are as large as nê for some absolute constant 
€ > 0; we have already stated this result in Theorem 2.29. This has proven to be 
immensely valuable in a number of applications in which polynomial-type bounds 
are desired, for instance in Gowers’ proof of Szemerédi’s theorem (see in particular 
Section 11.3). The polynomials in Gowers’ proof were implicit, but by following 
his ideas, one can work out the explicit version given in Theorem 2.29. Our treat- 
ment here is based on that in [340]. 

As it turns out, one can view the Balog—Szemerédi—Gowers theorem as a state- 
ment about dense bipartite graphs. Clearly, if a bipartite graph G(A, B, E) has 
many edges, then there will be many pairs of vertices a € A, b € B which are 
connected by paths of length 1. One then expects there to be many pairs a, a’ € A 
which are connected by paths of length two, and many pairs a € A, b € B which 
are connected by paths of length three. Furthermore, this connectivity becomes 
increasingly more “uniform” as the length of the path increases; compare with the 


262 6 Graph-theoretic methods 


results on arithmetic progressions in sum sets in Section 4.7. It is this uniformity 
which is essential to the proof of the Balog—Szemerédi—Gowers theorem. 

We begin by formalizing the above principle for paths of length two and length 
three. 


Lemma 6.19 (Paths of length two) Let G(A, B, E) be a bipartite graph with 
|E| > |A||B|/K for some K > 1. Then, for any 0 < e < 1, there exists a subset 
A’ C A such that 


; | Al 
|A’| > —— 
V2K 


and such that at least (1 — £) of the pairs of vertices a, a’ € A' are connected by 
at least ==3|B| paths of length two in G. 











2K? 
Proof By decreasing K if necessary we may assume |E| = |A||B|/K. Observe 
the combinatorial identities 
IN(b)| IN (a)| |E| 1 
Ebeg = Esca = = 
= JAI = IB] IAIlIB] K 
and 
E INO? IN(a)N N@)| 
beB [A] a,a'EA [B] 


Applying Cauchy—Schwarz we conclude that 


IN(a) N N(a’)| sil 
|B| = KZ 





Ea,a'cA 


Let Q be the set of all pairs (a, a’) such that |N (a) N N(a')| < sg2|B]; in other 
words, (a, a’) € Q if a, a' are not connected by at least => paths of length two. 
Clearly we have 


IN(a) N N(a’)| E 


ENOR ap es 





and hence 


IN(a) N N(a’)| à 1 
[B] =FR 





Eqa'eA (: = “Ka. aye 9) 


The left-hand side can be rearranged as 


1 1 
Eses ap > (1- iaae) 


a,a'EN(b) 


6.4 Proof of the Balog—Szemerédi-Gowers theorem 263 


and hence by the pigeonhole principle there exists b € B such that 
X (: liça, Ne D) = z 
— — -I((a,a 
2 = 9 K2° 
IAD? gene) = 2K 


In particular this implies that |N(b)| > War and that |{a,a’ € N(b): (a,a’) € 
Q)}| < e|N(b)|*. The claim then follows A R A’ := N(b). 














We now obtain an analogous result for paths of length three. 


Corollary 6.20 (Paths of length three) Let G(A, B, E) be a bipartite graph 
with |E| > |A||B|/K for some K > 1. Then there exists A' C A, B’ C B with 
|A’| > aL and |B'| > Ël, such that every a € A' and b € B' is connected by at 


— 4K’ 
|Al|B| 


least zg: paths of length three. 





Proof Before we apply Lemma 6.19 it is convenient to prepare the graph G a 
little bit. Let A be the set of vertices in A that have degree at least |B|/2K, and 
let Č = G(A, B, E) be the induced subgraph. Since at most |A||B|/2K edges 
are removed when passing from G to G, we see that G has at least |A||B|/2K 
edges. Writing |A| = L|A| for some L > 1 and applying Lemma 6.19 to G (with 
K replaced by 2K /L and e := zw) we can find a subset A’ of A’ of size 





A> Al IAI 
= VIOKID) 22K 
and such that 1 — Zz of the pairs a, a’ € A’ are connected by at least L*| B|/128K? 


paths of length two. 

Let us call a pair (a, a’) € A’ x A’ bad if ey are not connected by at least 3 a 
paths of length two; thus there are at most —> re K |A’|? bad pairs. Let A’ be the set of 
all a € A’ such that at most x14" pairs (a, a’) are bad. Then |A’\A’| < Al , and 
thus 








Having constructed A’, we turn now to B’. Since every element in A (and hence 
in A’) has degree at least |B|/2K, we have 


dolla e A’: (a,b) € E) = Wa, b) EE: ae A> la, : 


beB 


so if we let 


= {be B: iae äi anena TI 


264 6 Graph-theoretic methods 


then we have 





IB] 1A’ |A’1B| 


A’||B’| > A’: (a,b) € E)}\ > JA’ B| = 
|A’|| l> do lae (a,b) € E)}| = | Sr aK P! IF 


beB' 
In particular we have |B’| > |B|/4K. 
Finally, let a € A’ and b € B’ be arbitrary. By the construction of B’, then b 
is adjacent to at least |A’|/4K elements a’ of A’. By construction of A’, at most 
|A’|/8K of the pairs (a, a’) are bad. Thus there are at least |A’|/8K > |A|/16/2K 
vertices a’ which are simultaneously adjacent to b, and are connected to a by at 




















least ZIBI paths of length two. Thus a and b are connected by at least 
IA| _L7|B| , lAlIBI 
16/2K 128K? ~ 2!?K4 
paths of length three. 


We can now derive as a consequence the Balog—Szemerédi—Gowers theorem, 
Theorem 2.29. 


ProofofTheorem2.29 First observe that we may ensure that A and B are disjoint, 
by the artificial trick of replacing the ambient group Z with Z x Z, replacing A 
with A x {0}, and B with B x {1}. Let us view the set G C A x B in the theorem 
as a bipartite graph on A and B. Applying Corollary 6.20, we can find A’, B’ 
obeying (2.18), (2.19), and such that every pair a € A’, b € B’ is connected by at 
least |A||B|/2'*K* paths of length three: 


[AIIB] 


Ia’, b’) € A x B : (a, b’), (a', b), (a', b) € G) = Siaga 





Exploiting the obvious identity 
a+b = (a+b — (a+b) + (a' +b) 


and writing x := a + b', y := a' + b', z := a' + b, we conclude that 


A{B: arpa Ae 
I{(x, y, z) € + :x—y+z=a+ | Z gogr" 





Since the total number of triples (x, y, z) is at most 
G 
|A + BP < (K'PIAP?IBI®, 


we conclude that the total number of possible values for a+b is at most 
2? K4*(K’|A|!/2|B|!/?, and the claim follows. 














Note that in this proof it is not critical that the group is abelian. For a 
multiplicative group, we can replace a + b = (a+ b’) — (a' + b') + (a' + b) by 
ab = (ab')\(a'b')~! (a'b), and the rest of the proof is the same. 


6.4 Proof of the Balog—Szemerédi-Gowers theorem 265 


To conclude this section, let us mention a generalization of Balog—Szemerédi— 
Gowers result for hypergraphs. Let A;,..., Ag be additive sets with common 
ambient group (which we may take to be disjoint, by the trick used above) and let E 
be some family of ordered k-tuples (a1, ..., ax) such that a; € A;, 1 <i < k. The 
sets A;,..., Ag together with E are known as a k-uniform k-partite hypergraph 
which we shall call H; the set E is then known as the edge set of H (notice 
that a bipartite graph is a special case when k = 2). We denote by B ae A; the 
collection of the sums a; + ----+ a, where (a), ..., ag) € E. For the case k = 2, 
we are talking about bipartite graphs. 


Theorem 6.21 [340] Let k > 1, and let n, K be positive numbers. If A,,..., Ax 
are additive sets in a group Z of cardinality at most n, then H(A,,..., Ag, E) isa 
< Kn, 





k-partite k-uniform hypergraph with at least n* / K edges and Bu Aj 
then one can find subsets A; C A; such that 


© |A] = Q(n/K™®) for all 1 < i < k. 
e JA} + FAL] = QUK On), 


The heart of the proof is the following claim. 


Claim 6.22 Let A,,...,A, and n,K be as in the theorem above. Set 
X= Orit Ai. There are subsets A; C Aj,i=1,...,k of cardinality at 
least Qa (n/ KY) and sets Y; © Z,1 < j < 2k —2 of cardinality at most 
O,(K %n), such that every element in Aj +--+ + A; can be written in the form 
x+ ae yj where x € X, y; € Y; in at least Qg(n™~?/ K %) ways. 


It is easy to deduce Theorem 6.21 from this claim. For the sets A}, ..., Aj, as 
in the claim, we have 
Ixi Tar IY; 
1 A JF 
|A, +--+ A,| < Qn 2K HD) 
= (K%On) 


as desired. The proof of Claim 6.22 is left as an exercise. 


Exercises 


6.4.1 Let G=G(A, B, E) be a bipartite graph such that |E| > |A||B|/K. 
Show that there exists a subset A’ of A of cardinality |A’| > |A|/K such 
that any two elements in A’ are connected by at least one path of length 
2 in G. Show that |A|/K cannot be improved to |A|/K + 1, even when 
A, B, and K are large. 

6.4.2 [210] Let d be a large integer. Let V = {0, 1}4, be the d-dimensional dis- 
crete cube, and let G = G(V, E) be the bipartite graph formed by joining 


266 


6.4.3 


6.4.4 


6.4.5 


6.4.6 


6 Graph-theoretic methods 


an edge between x, y € V if x and y differ in at most d/2 coordinates 
(i.e. if the Hamming distance between x and y is at most d/2). Show 
that |E| = G + 04-+00(1))|V|?, but if V’ is any subset of V with size 
|V’| > c|V| then there exist x, x’ in V’ that are connected by fewer than 
Od—co(|V |) paths of length 2 in G. (Hint: use a volume-packing argument 
to find two points x, x’ in V’ which are almost antipodal in the sense that 
their Hamming distance is d — O(1).) Convert this example into a bipar- 
tite example and show that one cannot expect to eliminate the (1 — €) 
factor in Lemma 6.19 even if one lets £ be sufficiently small depending 
on Ķ. 

(Benny Sudakov, private communication) Let G be a bipartite graph 
G = G(A, B, E) with |A| = |B| = N and|E| = O(N’) where N is suf- 
ficiently large. Show that G contains a complete bipartite graph with 
Q(log N) vertices in each color class. Show that the bound Q<(log N) is 
best possible. 

Let Z be the finite additive group Z = ZA for some integer d, and let 
Z be the Pontryagin dual. Let G = G(Z, Z, E) be the bipartite graph 
formed by connecting x € Z to x € Z whenever x(x) = 0. Show that 
|E| = |A||B|/2. Using (4.2), show that one has |A||B| < |Z| whenever 
AC Z, BC Z' isa bipartite clique in G. Conversely, whenever N; and 
N are positive integers such that Nı N2 = |Z|, show that there exists 
a bipartite clique A C Z, B C Z’ in G with |A| = N; and |B| = M2. 
Compare this result with Exercise 6.4.3. 

(Dyadic pigeonhole principle) Let G = G(A, B, E) be a bipartite graph 
with |E| > |A||B|/K for some K > 1. Show that there exists some | < 
K’ < K and some induced subgraph G’ = G(A’, B, E’) of G(A, B, E) 
with 


IE|/(C + Clog K) < |E'| < |El; |A|/(C + C log K) < |A'| < IAI 


such that |B|/2K’ < degg (a) < |B|/K' for all a € A’. 
(Simultaneous popularity principle) Let G = G(A, B, E) be a bipartite 
graph with |E| > |A||B|/K for some K > 1. Show that there exists an 
induced subgraph G’ = G(A’, B’, E') with the bounds 


[AIIB] 
2K? 
|A| 
K? 
|B| 
K? 





|A’||B’| > |E"| > 
|A’| > 


|B’| > 


6.4.7 
6.4.8 


6.4.9 


6.4.10 


6.5 Pliinnecke’s theorem 267 


such that deg,,(a) > |B|/2K and deg,,(b) > |A|/2K for all a € A’ and 

b € B’. (Hint: choose A’, B’ to maximize the quantity aoe aoe 3| .) 

Prove Claim 6.22. (Hint: use induction.) 

Using the same hypotheses as Theorem 2.29, show that for any ¢ > 0 there 
G' 

exists a set G’ C A’ x A’ such that |G’| > (1 — «)|A’|? and |A’ — A’| < 

2(K K')* 

Seager a 

Improve the 2!? factor in Theorem 2.29 to 2! by exploiting the fact that 

all of the paths of length three constructed in Corollary 6.20 pass through 

A’, which is a slightly smaller set than A. 

[38] Let A, B be additive sets in an ambient group Z, and let G C 


A x B be such that |G| > |A||B|/K and |A $ B| < K|A|!?|B |! for 
some K > 1. Show that there exist subsets A’, B’ of A, B such that 
|A'| = Q(K~° |Al), |B'I > QCK~P |B), d(A’, BY) = O(1 + log K), 
and |G N (A’ x B’)| = Q(K~? | A||B]). (Hint: the novelty here is that 
we still wish the refinement A’ x B’ to capture a large portion of G. 
This requires that one revisit the arguments in Lemma 6.19 and Corol- 
lary 6.20 and perform some additional “popularity” refinements to ensure 
that every time one reduces the size of A or B, one still keeps a significant 
fraction of elements from G. One may also need to use Lemma 2.30 at 
times to ensure that one also keeps a large number of “popular differ- 
ences” between various refinements of A and B.) For an earlier result of 
this type, see [223]. 


6.5 Pliinnecke’s theorem 


One of the most useful tools for the study of sum sets is Pliinnecke’s theorem. In 
order to state this theorem, we first need some notation. 


Definition 6.23 (Magnification ratio) A directed bipartite graph is a triple 
G(A, B, E), where A, B are finite sets (not necessarily disjoint) and E C A x B 
is a collection of pairs (a, b) from A and B. We write G : A —> B to emphasize 
the directed nature of this graph, and also write a >g b to denote the statement 
that (a, b) € E. If X C A, we use G(X) := {b € B : at b for some a € X} to 
denote the image of X, and then define the magnification ratio ||G || of G to be the 


quantity 





IGCOL 


IGI = | min 
XCA:XAH |X| 


268 6 Graph-theoretic methods 


Equivalently, ||G|| is the smallest number such that |G(X)| > ||G|||X| for all sets 
X CA. 

IfG: A— Band H : B — C are two directed bipartite graphs, with A, B, C 
disjoint, we define the composition H o G : A —> C to be the directed graph 
defined by setting a > y.g c in H o G if and only if there exists b € B such 
that a >g b >q c. 


One can also view a directed bipartite graph G : A —> B as a multiply-valued 
function from A to B, and the magnification ratio is then a measure of the multi- 
plicity of this function. 


Example 6.24 Let A, B be additive sets with common ambient group. Then we 
can form the directed bipartite graph G4, : A —> A + B by setting a ŒG, s a + 
b if and only ifa € A and b € B. Observe that 





aps min EEA 
ABT caxz |X| 7 JA 


Also, observe that if A, B, C are additive sets with A, A + B, A + B + C disjoint, 
then G4+g,c © Ga.p = Ga psc. 


For general directed bipartite graphs one has the inequality ||H oG|| < 
||G|||| Æ ||. However there is a deeper inequality available for certain families of 
directed bipartite graphs known as Pliinnecke graphs. While this concept can be 
given for abstract graphs, it is easiest to describe for graphs whose vertices lie in 
an additive group (which is always the case for our applications). 


Definition 6.25 (Pliinnecke graphs) Let Ao, A1, A2 be three additive sets in an 
additive group Z. Two directed bipartite graphs G; : Ag —> A; and G2: A; > A2 
are said to be commutative if, whenever a,b,c € Z are such that a œg, a+ 
b œ>, a +b +c in Gy, then one also has a œg, a + c |g, a +b +c. More 
generally, if k > 2, and Ao, ..., Ax are additive sets in Z, we define a Pliinnecke 
graph of order k to be a k-tuple (Gj, . . . , Gg) of bipartite graphs G; : Aj-1 > Aj 
such that each adjacent pair G;, Gj+1 for 1 < j < k is commutative. 


Here is a more informal way to describe commutativity: if two adjacent edges of 
a parallelogram lie in G; U G2, then so do the other two edges of the parallelogram). 


Example 6.26 Let A, B be additive sets. Then the k-tuple 
(Gap, GA+B,B, - - -, GA+(k-1)B,B) 
of directed bipartite graphs (as defined in Example 6.24) forms a Plünnecke graph. 


We are now ready to state Plünnecke’s theorem. 


6.5 Pliinnecke’s theorem 269 


Theorem 6.27 (Pliinnecke’s theorem) [273] Let (G,,..., Gg) be a Pliinnecke 
graph of order k. Then the sequence of magnification ratios ||G; o +++ o Gil]! 
i=1,...,k is non-increasing in i. In particular, we have 


lGko ++ o Gill < Gill. 
Applying this theorem to Example 6.26, we immediately obtain 


Corollary 6.28 (Pliinnecke’s inequality) Zf A and B are two additive sets in an 
ambient group Z and |A + B| < K|A|, then for any positive integer k there is a 
subset X of A such that 


|X +kB| < K*|X|. 
In particular we have 
IkB| < K*\Al, 


This inequality has a number of applications to sum set estimates. For instance, 
from this inequality and the Ruzsa triangle inequality we obtain 


Corollary 6.29 (Pliinnecke—Ruzsa estimates) Suppose that A, B are two addi- 
tive sets in an ambient group Z such that |A + B| < K|A|. Then we have 
jaB —mB| < K"™|A| foralln,m > 1. 





In particular, this implies thatif |A + A| < K|A], then |n A — nA| < K?”|A] for 
alln > 1; thus sets which are approximately closed under addition or subtraction 
are also approximately closed under repeated additions and subtractions. 


6.5.1 Main ideas of the proof 


To prove Plünnecke’s theorem, it suffices to prove that 
Gr o+: 0 Gil < IIG; o -++ 0 Gill" (6.2) 


for all 1 < i < k, since the claim then follows by truncating k to equal i + 1. In 
fact, it will suffice to show a special “normalized” case of this inequality: 


Proposition 6.30 (Normalized Plünnecke inequality) Let (G1, ..., G) be a 
Plünnecke graph of order k such that ||Gko---Gı|| > 1. Then we have 
|G; o---oG,|| > 1foralll <i < k. 


Our proof consists of two steps. In the first, we show that Proposition 6.30 
implies the theorem. In the second step we prove this proposition. 

The main tool for the first step is the so-called “tensor product” trick. We first 
show that Proposition 6.30 implies an inequality somewhat weaker than what 


270 6 Graph-theoretic methods 


we want to prove. Applying this inequality to a high power of the graph under 
consideration and taking limits will enable us to obtain the full version. 

The second step is a pure graph-theoretical argument, whose main ingredient 
is the classical theorem of Menger about the number of disjoint paths in a graph. 
The reader may sense a connection here as the assumption ||G, o --- o Gil]! > 1 
simply means that the number of vertices in Ag which can be reached from a subset 
X of Ag by a directed path of length k is at least |X|. 


6.5.2 The first step 


IfG:A— BandG’ : A’ > B’are bipartite graphs, we define the direct sum G ® 
G': A ® A' —> B È B’ by requiring (a, a’) œ> gec (b, b’) if and only if a tg b 
and a’ œ> œ b’. It turns out that the notion of direct sum interacts well with those 
of magnification ratio and composition. 


Claim 6.31 
IG $ Al] = ||G|||| ZI. (6.3) 
(Gy,o---0 G1) ® (Hk o---0 Hi) = (G: ® Hy) o---0(Gi GM). (6.4) 


The proofs are left as exercises. 
To prove (6.2), it now suffices to prove the apparently weaker inequality 


IGr o+- o Gill'/* < Oj4 (IG o -+ 0 Gill”) (6.5) 


for some constant C; > 0 depending on i and k. For, if we could prove (6.5) for 
all Plünnecke graphs G, we could in particular apply it to higher powers G®™ for 
any large M. Using the above claim, it follows that 


lG o+ 0 Gill” < Oj (IIGi o -++ 0 Gi”) 


for all M > 1. Taking Mth roots and then letting M —> oo we obtain (6.2). 
We next deduce (6.5) from Proposition 6.30. First we deal with the case 
Gr o -- -o G,||!/* < 1. Let N be the smallest positive integer such that 


k 
Go.. o G> £. 
|G, o-+-0G\|| 2H 


As ||Gg o --- o G,||'/* < 1, then N > k > 2 and the definition of N implies that 


Wk k 2k 
[Geers ro Gal" = Sa 
N-1 N 
We introduce an auxiliary Pliinnecke graph (Hj, ..., Hg) of order k, constructed 
as follows. Let E := {e;,..., ev} be the basis vectors of Z”, and set 


(Hı, ..., Hk) = (Goze, Gee, G2E,E, - . -, Goe-1z,E), 


6.5 Pliinnecke’s theorem 271 


where we use the notation of Example 6.24. In other words, we have u +> p, u + €j 
whenever u is the sum of i — 1 basis vectors and 1 < j < n. Itis easy to show that 


the ith vertex set i E has cardinality aS D Since 


NÌ NÌ (N+i-D! _., 
< < < 
eo TS aN Spi T 








we have that 
1 Vi 
aN < pore 0 Mlt <N. 
i 
Consider the graph G’ = G @ H. Using the claim, we have 
k N 
IGI = (GIR > —— = 1, 
Nk 
which guarantees the assumption of Proposition 6.30 for G”. Applying this propo- 
sition to G’, we obtain for every 1 < i < k 
IG; o +++ 0 GI = IG; o +++ 0 Gy H; o +++ 0 MyM! > 1. 


Since || H; o --- o H, ||! < N, it follows that 
. 1 1 
IG; o---0 Gill! = = Fr IlGeo---o Gill! 


completing the proof. 

To deal with the case when ||G; o --- o Gx|l1/% > 1, wedefine N to be the largest 
positive integer such that |G; o --- 0 Gx||1;4 = N. Replacing the Pliinnecke graph 
(H1, ..., Hg) by its transpose (H;*,..., HÝ), formed by reversing all the arrows, 
one can easily verify that 


1 é 
=N! < || H% o+ o Bt <N™. 
i 


The rest of the proof is similar. 


6.5.3 The second step 


The key ingredient of this step is a classical theorem due to Menger. Consider a 
directed graph G and let A and B be two disjoint sets of vertices. We say that 
a set C of vertices is a cut separating A and B if by removing C we destroy all 
directed paths from A to B (a path is from A to B if it starts in A and ends in 
B). Let I be a collection of (mutually) vertex disjoint paths from A to B with 
maximum cardinality N. It is trivial that any cut C has cardinality at least N, as C 
should contain at least one vertex from each path in T. It turns out that this bound 
is always sharp: 


272 6 Graph-theoretic methods 


Theorem 6.32 (Menger’s theorem) Let G, A, B, N be as above. Then there is 
a cut C with cardinality N separating A and B. 


For a proof of this classical theorem, see Section 6 of [238], or the exercises 
below. 

Now consider a Pliinnecke graph consisting of directed bipartite graphs G; : 
Ao > Aj,..., Gk : Ap-1 — Ax. By the trick of replacing the ambient group Z 
with Z x Z, and A; with A; x {j}, we can ensure that the A; are disjoint. Now 
let G be the union of all the graphs G,,..., Gx; thus G is a directed graph on 
Ao U---U Ag. Set A = Ag and B = A, and let T = {y1,..., yy} be a maximum 
collection of vertex disjoint paths as above. By Theorem 6.32 we can find a vertex 


cut C = {c1,..., Cy} in G separating Ao from A, such that cj; € V(y;) for all 
1<j<N. 
Since all the paths y,,..., yy start in Ag and are vertex-disjoint, it is clear that 


N < |Ao|. The core of the proof is the following lemma. 
Lemma 6.33 Under the assumption of Proposition 6.30, we have N = | Aol. 


Assuming this lemma, the rest of the proof is straightforward. If N = | Aol, then 
every vertex v in Ag must be the initial vertex of exactly one path in I’. Since these 
paths are vertex-disjoint, we thus see that |G; o--- o G,(X)| > |X| forall X C Ao 
and the claim follows. 

In order to prove Lemma 6.33 we partition the cut C as C = Co U -+ -U Ck 
where C; := C N Aj. The heart of the matter is the following lemma. 


Lemma 6.34 For any 1 <i < k-— 1, C’:=(C\C;)UC; is also a cut in G sep- 
arating Ao from Ag. 


Applying Lemma 6.34 iteratively, we can conclude that there is a cut which 
concentrates on Ag and Az. The union Co U Cy (where Co C Ao, Ck C Ag) iS a 
cut if and only if all paths starting from a point in X = Ag\Cpo end in Cz. The 
definition of the magnification ratio implies that 

ieeeieaie 
[Cx 


On the other hand, |Co| + |Cx| = N and |X| = |Ao| — |Col. Since 
Gro: 0 Gill 2 1, 


it follows that N > |Ao|, proving Lemma 6.33. 

It remains to prove the critical Lemma 6.34. This proof is actually the only 
place where one needs to utilize the commutativity property of the consecutive 
pairs G;, Gi+1. Consider C; as in the lemma. We can assume that C; is not empty 
(otherwise there is nothing to prove). Let C; = {c1,..., Cm} for some 1 < m < N. 


6.5 Pliinnecke’s theorem 273 


Fix a maximum collection of mutually disjoint paths. For each 1 < j < m, cj 
is a vertex of exactly one path y; from this collection. Thus, there exist unique 
E Aj—1 and c} € Aj+1 such that the edges (c; — cj) and (cj; > cj) lie in yj. 
Let C7 € Aj+, denote the sets C7 := {cy , ..., Ch} Since the paths y; are vertex- 
disjoint, we have |C; | = |C;| = ICF]. Also, C7 must be disjoint from C, since 
each path y; in the collection contains exactly one cut point. 

Suppose for contradiction that C’ was not a cut, i.e., there was a path y from 
Ag to A; which did not intersect C’. But since C is a cut, y must intersect C. This 
forces y to intersect A;_; at a vertex v € A;—ı which does not lie in either C;_; 
or C; . Furthermore, the intersection of y with C is a point in C;. Let us define 
sı to be the number of edges from C; to C;, s2 to be the number of edges from 
C; U {v} to C; and s3 to be the number of edges from C; to Cy. In order to obtain 
a contradiction, we are going to prove the following three mutually inconsistent 


inequalities 


Cc 























S1 < S2, $2553, $3 < S1. 


The first (strict) inequality sı < s2 is trivial, as v does not belong to C; and there is 
an edge from v to C; along the path y. To prove s3 < sı, we are going to construct 
an injective map between the edges from C; to C A and the edges from C; U {v} 
to C;. Take any edge c; > c} from C; to CF, for some 1 < j, j’ < m. Since G; 
and G;+1 are commutative and (c; > cj) € Gi, (cy > c$) € Gi+1), we see that 
(E= c’) € Gi and (c’ > c$) € Gi+1), where c’ := cj + c} — cj. Furthermore, 
c’ must lie in C;, otherwise we could find a path from Ag to Ag avoiding the cut 
C by using y; to travel to c; , then passing through c’ to Chis and then using y; to 
travel to Ay. Thus we obtain an edge (c; > c’) from C; to C;. One can easily 
verify that this map is injective. 

The proof of the remaining inequality is similar. When dealing with an edge 
from v, we, naturally, construct an avoiding path by using y up to v. 


Exercises 


6.5.1 | Show that one can take the set X in Corollary 6.28 to be as large as 
(1 — )|A| for any € > 0, at the cost of replacing the factor K* with 
(K /e)*. (Hint: apply Corollary 6.28 repeatedly, removing X from A at 
each iteration.) 

6.5.2 Prove Claim 6.31 and Claim 6.4. 

6.5.3 By induction on the number of edges in a graph G(V, E), show that if 
the minimal cut needed to disconnect A and B has size N, then there 
exist N disjoint paths from A to B. (Hint: if there exists a minimal cut C 
that spans at least one edge {x, y}, then remove this edge and construct 
N disjoint paths from A to C and from C to B. If instead every minimal 


274 


6.5.8 


6.5.9 


6.5.10 


6.5.11 


6.5.12 


6 Graph-theoretic methods 


cut is independent, take an edge {x, y} and contract it by identifying x 
with y (and removing the resulting loop). Show that the resulting “quo- 
tient graph” still has minimal cut N and apply the induction hypothesis.) 
Deduce Menger’s theorem as a corollary. 

Let A be an additive set. Show that the sequence of real numbers |n A|!/” 
is non-increasing in n, i.e. |mA|!/" > |nA|!/" for alln > m> 1. 

[297] Let N be a large integer, and let A, B C Z? be the sets A := 
({1, N] x [1, N] x {0}) U ({(0, 0)} x [1, N]) and B := ([1, N] x {0} x 
{0}) U ({0} x [1, N] x {0}). Show that |A| = @(N7), |B| = ©(N), and 
|A + B| = O(N?) but |A + 2B| = O(N’). 

[297] Let A, B be additive sets in an ambient group Z. Show that 
|A+2B| < oes (Hint: use Exercise 6.5.1 to estimate |A’ + 2B| for 
a large A’ C A, and use the crude bound |(A\A’) + 2B| < |A\A"||2B| 
and Corollary 6.29 to estimate the remainder. Use the tensor power trick 
as in Corollary 2.19 to eliminate any constants you encounter.) Compare 
this with Exercise 6.5.5. 

Let 0 < 6 < 1. Show that there exists additive sets A, B in an ambient 
group Z such that |A + B| = ©(|A|) but such that for every subset A’ 
of A for which |A’| > (1 — ô)| A|, we have |A’ + B+ B| = Q(|A|/8). 
(Hint: adapt the example in Exercise 6.5.5.) 

Prove Corollary 6.29. 

Suppose that A, B are additive sets with common ambient group such 
that |A + B| < K|A| and |2B| < K|B|. Show that |A+nB—mB| < 
K?>maxv.)+3) Al for all n,m > 0. (Hint: use Ruzsa’s covering lemma, 
Lemma 2.14.) Compare this with Exercise 6.5.5. 

[297] Let d be a large integer, let M be the nearest integer to (7/6)%, 
and in the ambient group zy x Z let A := (z2 x {0} U ({0, 1, 3}! x 
[1, M])and B := {0, 1, 3}¢ x {0}. Show that |A| = ©(7%), |B| = @(3%), 
|A + B| = @(7%), but |A — B| = ©((49/6)™). Thus even if |A + B| is 
comparable to |A|, |A — B| can be as large as | A|?~!°27°. 

[297] Let d be a large integer, let B = {e),..., €24} be the standard basis 
of Z”, and let A = dB. Show that |A + B| = @(\A]) but |A — B| = 
©(|A| log |A|). More generally, show that |A’ — B| = ©(|A’| log |A|) for 
any non-empty subset A’ of A. This shows that there is no analog of 





Corollary 6.29 for n = —1 unless one is willing to lose a logarithmic 
factor. On the other hand, see Exercise 6.5.12 below. 

Let A,B be additive sets with common additive group such that 
|A + B| < K|A|, andlet N > 1. Show that there exists an additive set A’ 
in A with |A’| > 3|A| and |A’ — B| < (4K)™/N A|! +!/N. Compare this 
with Exercise 6.5.11. (Hint: first use Exercise 6.5.1 to locate a large set A’ 


6.5.13 


6.5.14 


6.5.15 


6.5.16 


6.5.17 


6.5.18 
6.5.19 


6.5 Pliinnecke’s theorem 275 


such that | A’ + 2" B| < (4K)*"|A’|. Then use the pigeonhole principle to 
find 0 < j < N such that |2/+!B| < (4K)?"|A|)!/" |2} B|. Then control 
|A’ — B| by |A’ — 2/ B| and use Ruzsa’s triangle inequality.) 

Let A, B be non-empty subsets of F, such that p> < |A], |B] < p!~ for 
some 0 < 6 < 1. Show that there exists an € = (ô) > 0 depending only 
on 6 such that either |A + B| > p*|A| or |B - B| > p°|B|. (You will of 
course need the results from Section 2.8.) 

Let G; : Ag —> A, and G3 : Aj —> Ap? be abstract directed graphs (not 
necessarily living in an additive group). We say that G,; and G2 are 
abstractly commutative if for every edge a; > G, az and any collection 
of edges al >G, 41, ..., a4] >G, a, it is possible to find n forward 
paths froma! >g, b! >c, a, ..., a} >G, b” >c, a withb!, ..., b” 
all disjoint, and similarly if G1, G2 are replaced by their transposes G3, 
GÏ. Show that the commutative property implies the abstract commuta- 
tive property, and furthermore the Pliinnecke inequalities still hold if the 
commutative property is replaced with the abstract commutative property. 
Thus while the Pliinnecke inequalities do require some additive structure 
on the underlying graph (and in particular the commutativity of the under- 
lying group), the amount of structure needed is fairly minimal. 

Improve the upper bound in (2.11) to o[A] < e744:4), or equivalently 
that |A + A| < lazat . Note that this gives another proof of the inequality 
|A + A| < |A — A?” (Exercise 2.3.13). 

Obtain improvements to Corollary 2.23 and Corollary 2.24. Obtain as 
sharp a value of the constants as you can. 

[Ben Green and Imre Ruzsa, private communication] Let x : Z > Z’ be 
a group homomorphism, and let A be an additive set in Z. Show that 
olr(A)] < o[A]? (compare with Exercises 2.2.10 and 2.3.8). Hint: use 
Plünnecke’s theorem to find a subset X C A with |X + 2A] < o[A]?|X| 
small. Let M be the largest multiplicity of z on X. Establish the bounds 
|X +2A| > M|2x (A)| and M|x(A)| > |X]. 

Use the preceding exercise to obtain sharper bounds in Corollary 5.43. 
[162] Let A c R? be an additive set containing the cube {0, 1}¢. Show 
that |A + A| > 2¢/*|A]. (Hint: from Exercise 3.4.8 we know that |B + 
A + {0, 1}”| > 2f|B| for all subsets B of A. Now use the Pliinnecke 
inequality.) In the converse direction, show that there exist arbitrarily 
large sets A containing {0, 1}¢ with doubling constant comparable to 
B/D. 





7 





The Littlewood—Offord problem 


Let vi, ..., Vg be d elements of an additive group Z (which we refer to as the 
steps). Consider the 24 sums €,v, +- -< + €gvg with €;,...,€ 7 € {—1, 1}. In this 
chapter we investigate the largest possible repetitions among these sums. 

We are going to consider two, opposite, problems: 


e The Littlewood—Offord problem, which is to determine, given suitable 
non-degeneracy conditions on vj, ..., vg and Z (e.g. excluding the trivial case 
when all of the steps are zero), what the largest possible repetition or 
concentration can occur among these sums. 

e The inverse Littlewood—Offord problem, which supposes as a hypothesis that 
the v,,..., Vg have a large number of repeated sums, or sums concentrating in 
a small set, and asks what one can then deduce as a consequence on the steps 
Vest Vg: 


These two problems have a similar flavor to that of sum set estimates and 
inverse sum set estimates respectively, and occur naturally in certain problems of 
additive combinatorics, in particular in considering the set of subset sums F S(A) = 
{2} aega : B C A} ofa given set A, or in the determinant and singularity properties 
of random matrices with entries +1. These problems has also arisen in several 
other contexts, ranging from the zeroes of complex polynomials (which was the 
original motivation of Littlewood and Offord [237]), to database security (see 
[163]). Note that the problem of determining which elements are representable as 
a sum €;v; +---+ €qva is essentially the notorious subset-sum problem, which 
is known to be NP-complete in general. Furthermore, by thinking of the sum 
1 €v; as a random variable depending on the atom variables €;, we can view 
the Littlewood—Offord problem as a special case of the problem of computing the 
probability distribution of a random variable, which is a well-developed topic in 
probability theory. 





276 


7.1 The combinatorial approach 277 


In this chapter, we present two different approaches. The first is the combi- 
natorial approach of Erdős and later authors, which phrases the problem in the 
theory of set systems (collections of subsets of a given set), thus allowing one to 
apply the theory of extremal set systems. This approach is very elegant and gives 
sharp results, but it is difficult to extend it to cases in which one has more com- 
plicated constraints on the steps v;. The second, and rather different approach, 
is the Fourier-analytic one introduced by Halász. The bounds obtained by this 
approach are usually off by an absolute constant from the best possible results, but 
the arguments are more flexible. 

A general theme will be that strong concentration or repetition of the above 
sums is closely related to strong additive structure among the steps v),..., Vn. At 
one extreme, if the group Z has no 2-torsion, then all the sums are distinct if and 
only if the vj,..., v, are dissociated (see Definition 4.32). At another extreme, 
if the v;,..., Vn are contained inside an arithmetic progression of small rank and 
volume, then one expects plenty of repetitions among the sums. The situation is 
thus somewhat analogous to the theory of sum set estimates and inverse sum set 
theorems studied in previous chapters, and indeed there will be strong similarities 
in our treatment of the two (in particular, the parallel use of combinatorial and 
Fourier-analytic methods). 


7.1 The combinatorial approach 


The fundamental concept in this approach is that of an anti-chain. 


Definition 7.1 (Anti-chains) A collection A of sets is known as an anti-chain if 
none of the sets is contained in any other; thus A Z B for any distinct A, B € A. 


Anti-chains are sometimes also referred to as Sperner systems, especially in 
older literature. 


Lemma 7.2 (LYM inequality) /240], [246], [385] Let A be an anti-chain of 
subsets of a finite set X. Then we have 

Ei 

|X|) = 

ACA (an 
Proof We give a probabilistic proof of Bollobás, using Katona’s method of ran- 
dom maps. Let @ : X — [1, |X|] be a random bijection from X to [1, |X|], chosen 
uniformly at random among all |X|! such bijections. A simple combinatorial argu- 


ment shows that 


1 
P(@(A) = [L JAI) = a 
ey 


278 7 The Littlewood—Offord problem 


for each A € A. On the other hand, since none of the A are contained in each 
other, the events #(A) = [1, |A|] are disjoint. Thus, the sum of their probabilities 
is bounded by 1, which implies the claim. 














From the obvious inequality ( aS < (i A i) we immediately conclude 


Corollary 7.3 (Sperner’s lemma) [332] Let A be an anti-chain of subsets of a 


finite set X. Then |A| < (ig oi) 


Note that the bound is clearly optimal, as can be seen by taking A to be the 
anti-chain consisting of all subsets of X of cardinality ||X|/2]. 
We can apply Sperner’s lemma to the Littlewood—Offord problem as follows. 


Corollary 7.4 [82] Let v,,..., Vn be real numbers with |vi| > 1 for all i. Let 
T= {x :x9 — 1 <x <x9+ 1} be an open interval of length 2. Then the total 
number of n-tuples (€1,..., €n) E {—1, 1}" with €1v1 +--+ + EnYn € I is at most 
lua) 


Proof By reversing the signs of some of the v; if necessary, we may assume 
that v; > 1 for all i. Now let A be the set of all subsets A of [1, n] such that 
Xei tae Oe ga vi € T. One can easily verify that A is an anti-chain, and hence 
by Sperner’s lemma |A| < ( ). The claim follows. 














n 
[n/2] 


Now let us give a different proof of Sperner’s lemma. We need to complement 
the notion of an anti-chain with that of a chain. 


Definition 7.5 (Chains) A chain is a sequence of sets A;,..., Am such that A; C 
A;+1 for all 1 < i < m; we refer to m as the length of the chain. We say a chain is 
connected if |A;4,\Aj;| = 1 for all 1 < i < m. A connected chain in a finite set X 
is said to be centered if |A1| + |Am| = |X|, or equivalently if |A;| = "5" + i 
for all 1 < i < m. Note that the length of a centered connected chain 7: to have 
the opposite parity as |X|. 


Lemma 7.6 (Chain decomposition lemma) [206] Let X be a finite set, and let 
X — {A : A C X} be the power set of X. Then 2* can be partitioned into disjoint 
non-empty centered connected chains. 


Proof Weinduceon|X|.Thecases |X| = 0, | are trivial. Now suppose that |X| > 
1 and the claim has already been proven for all smaller X. Write X = X’ U {xo} 
where |X’| = |X| — 1. By hypothesis, 2*’ can be partitioned into disjoint non- 
empty centered connected chains in X’. For each such chain A1, ..., Am, observe 
that the chains 


A}, te’ An, Am U {xo} 


7.1 The combinatorial approach 279 


and 
A1 U {xo}, ..., Am—1 U {xo} 


are connected centered chains in 2*, and can be easily be seen to partition 2*. 
Note that the chains of the second type may be empty, but they can of course be 
omitted from the partition without difficulty. The claim follows. 














Every centered connected chain in X has to contain exactly one pu of 
cardinality | X/2]. Thus the total number of chains in Lemma 7.6 is exactly (nae LIx1/2 Pe 
More generally, we see the number of centered connected chains of length m given 
by this lemma is exactly ( ) if m has the opposite parity 
of |X|, and 0 otherwise. 

Since an anti-chain can contain at most one element of every chain, we obtain 
a new proof of Sperner’s lemma (compare also with Menger’s theorem, Theorem 
6.31). In fact, the same argument gives the following generalization. 


|X| ) Pia ( |X| 
(|X |—m+1)/2 (|X |—m—1)/2 


Proposition 7.7 [82] Let A, ..., Ax be k disjoint anti-chains of subsets of a finite 
set X. Then 


5 IXI 
aesae > (oia 
i=—|k/2] L(|X| + i)/2| 
We leave the proof of this proposition as an exercise. We can then extend 
Corollary 7.4 without difficulty: 


Corollary 7.8 (Erdés’s Littlewood—Offord inequality) /82] Let vı, ..., Un be 
real numbers with |v;| > 1 foralli. Let I = {x : x9 —k < x < xo + k} be an open 
interval of length 2k for some oe k > 1. Then the total number of n-tuples 


(€j,...,€)) E {-1, 1}" withe;v, +--+ eau, € Tis atmost Y EP y) ETN 


One can replace the real numbers R by higher-dimensional spaces, such as the 
complex numbers C. To do this, we need a product form of Sperner’s lemma, as 
follows. 


Lemma 7.9 (Product Sperner lemma) /206] Let X and Y be finite sets, and let 
A be a collection of pairs (A, B) of subsets of X, Y, which are a product anti-chain 
in the sense that there are no distinct pairs (A, B), (A', B’) in A with either A = A’ 
and B Ç B', or A Ç A’ and B = B’. (To put it another way, for each fixed B, the 
collection of A for which (A, B) € A forms an anti-chain, and vice versa.) Then 


|A| < ( IXI+IY| ). 


LUXI+1¥1)/2] 


We leave the proof of this lemma as an exercise. As a consequence we have the 
complex version of Corollary 7.4. 


280 7 The Littlewood—Offord problem 


Corollary 7.10 [206] Let vı, ..., V, be complex numbers with |v;| > 1 for all i. 
Let B = {z : |z — zo| < 1} be a ball of radius 1. Then the total number of n-tuples 
(€1,...,€) E {—1, 1}” with eivi +--+ + €, 0, E B is at most (a) 


Proof By randomly rotating the complex plane we may assume that none of 
the v; are purely real or purely imaginary. By reversing the signs of some of 
the v; if necessary we may assume that Im v; > 0 for all i. Let X be the set 
of all i with Rev; > 0, and Y be the set of all i with Rev; < 0; thus X UY = 
[1, n]. Now let A be the set of all pairs (A, B) of sets A C X, B C Y such that 
eR OD gaug Vi E I. One can easily verify that A is a product anti-chain 
in the sense of Lemma 7.9, and the claim follows. 














In fact one has the analogous claim in general dimension, by a more sophisti- 
cated version of this argument; see [207]. 

This is only the tip of the iceberg concerning extremal combinatorics results of 
this type; see for instance [32] for a much more detailed treatment of these topics. 
Variants of this approach have also been successfully applied in cyclic groups; 
see [163]. 


Exercises 


7.1.1 (Set-pair estimate)[31] Let A1, ..., Am, Bi,..., Bm be finite sets such 
that A; N B; = Ø if and only if i = j. Show that 


m 1 
> (Are <1 


|Ail 


i= 


Note that this includes Lemma 7.3 as a special case (where B; := X\Aj;). 

7.1.2 | (Erdés—Ko-Rado theorem) [94] Let Aj,..., Am be an anti-chain in 
Zy such that any two A;, A; intersect (thus A; N A; Æ Ø for all i, j), 
and |A;| < k for all i and some k < N/2. Show that m < Cazes and 
show that this bound is sharp. (Hint: first show that for any bijection 
@: Zyn —> Zy, at most k of the sets #(A;) can be an interval of the 
form [a + 1,a + |A;|] for some a € Zy; this elegant argument is due to 
Katona [196].) 

7.1.3 Prove Proposition 7.7. (Hint: for any chain of length m, observe that at 
most min(m, k) elements of this chain can lie in A; U---U Ay. Now 
count how many chains there are of a given length in Lemma 7.6.) 

7.1.4 Prove Proposition 7.9. (Hint: if A;,..., Am is a connected chain in X, 
and B,,..., Bn is a connected chain in Y, show that there are at most 
min(m, n) pairs of the form (A;, B;) in A. Alternatively, decompose phe 
into chains B,,..., Ba, and for each such chain apply Proposition 7.7.) 


7.2 The Fourier-analytic approach 281 


7.2 The Fourier-analytic approach 


Now we present the Fourier-analytic approach of Halasz. It is convenient to use 
the language of probability theory. For any n-tuple v = (v,,..., Vn) of steps in an 
additive group Z, we use the notation X, to denote the random variable 


Xy = E€1V1 +e + EnUy 


where €;,...,€, are independent random variables taking values in {—1, +1} 
with probability 1/2 for each value. Clearly P(X, = x) equals the number of 
representations of x as €1V1 +---+€,U, with €1,..., €n E€ {—1, 1}, divided by 


2”. Note that X, is invariant under permutations of the n-tuple v. We use vw 
to denote the concatenation of v and w. The Littlewood—Offord problem then 
asks to control the distribution of X, for a given v, while the inverse Littlewood- 
Offord problem asks for some structural information on v given some unexpected 
distributional property of Xy. 

It will be useful to consider the more general random variables XP for any 
0 < u < 1, defined as 


Xy = eP te Pon, 


where e" ) Southard ef?) are independent random variables which take the values +1 


and —1 with probability 4/2, and 0 with probability 1 — u. Thus X% is the 
same as X, when u = 1, and at the other extreme u = 0 becomes the constant 
0. The intermediate cases correspond to “lazy random walks” with step sizes 
Ul, ..., Un. AS €; can be 0 with considerable probability, one expects x” to be 
more concentrated than Xy, and this will indeed be the case. In practice, the cases 
u < 1/2 are more amenable to Fourier analysis than the u = 1 case due to a certain 
“positivity” property which we shall come to shortly. 

In this section we shall consider the discrete problem of understanding the 
probabilities Px! a x) that a random variable x concentrates at a single 
point. In the next section we briefly discuss the analogous probability P(X Vee Q) 
for concentration in a cube. 

Let us first make some technical reductions to the problem. Firstly, we can 
reduce to the case when the ambient group Z is finite. This can be achieved 
by applying a suitable Freiman isomorphism of order n to the steps v1, ..., Up, 
(see Exercise 5.3.3) while noting that this does not affect the distribution of Xy. 
Secondly, we can reduce further to the case that Z is odd. To see this, observe 
from Corollary 3.8 that any finite additive group can be written as the product of 
a 2-torsion group and a group of odd order. The behavior of the random variable 
Xy, when projected down to the 2-torsion group is trivial (since +v; = —v; in this 
group), so we may, without loss of generality, project onto the other factor. Note 


282 7 The Littlewood—Offord problem 


that if the original elements v,,..., uv, lived in some torsion-free group such as 
Z’, then by Lemma 5.25 we could now place the vectors in a cyclic group of odd 
prime order. (In doing so we may temporarily obscure some of the “dimensional” 
structure of the elements v1, ..., vg, SO in some cases it is convenient to revert 
back to the original ambient group at certain stages of the argument.) 

With these reductions we can now express the distribution of Xy in terms of 
the Fourier transform. As usual we fix a symmetric non-degenerate bilinear form 
&-xon Z. 


Lemma 7.11 (Fourier representation of X,) Let Z be a finite group of odd 
order. If v = (v1... U,) is an n-tuple of elements of Z, then for any 0 < uw < 1 
and x € Z we have 


P(X = x) = Egez cos(278 - x) [ [a — u + pcos(27é - v;)). 
j=l 
Proof Since the quantity IE- — u + wcos(27& - v;)) is an even function of 
€, we can write the right-hand side as 


Ezeze(—& - x) | [0 — u + wcos(2z - vj). 
j=l 
Observing that 1 — u + pcos(27é - vj) = E(e(é - yj) and using the indepen- 
dence of the ey , we can rewrite this as 


EEzeze(& - (XP — x)). 


But the claim now follows from Lemma 4.5. 














This lemma already highlights the special role of the case 0 < u < 5, as in this 
case 1 — u + u cos(27& - vj) becomes non-negative. In the further case 0 < u < 
we have the elementary but very useful estimate 


1 — u + pwcos(2x& - vj) = exp ( — O(ullé - villeyz)) (7.1) 
where we recall that ||x||R/z denotes the distance to the nearest integer. 
From Lemma 7.11 we can immediately establish a number of useful bounds on 
how one distribution X u ) controls another. 


1 
4? 


Corollary 7.12 Letv = (vj,..., Un), W = (W1, ..., Wm) be tuples in an additive 
group Z which is torsion-free or is finite of odd order. Let x € Z. 


e (Domination) If0 < u < w < 1, and at least one of w < 1/2 or u < w'/4 
hold, then 


P(X) = x) < P(X =0) = Esez | [0 — u + wcos(2zé - v;)). 
jel 


7.2 The Fourier-analytic approach 283 


In particular, if u < 1/2, then X concentrates more at the origin than 
anywhere else. 
e (Duplication) If 0 < u < 1/2, then 


P(X = x) < P(X% = 0) 
for all integers k > 1, where we use v* to denote the concatenation of k copies 
of V. 
e (Holder) If w,, ..., Wx are tuples in Z (possibly of different length) and 
0 < u < 1/2, then 
P(X, w, = x) )<[ [Pl a) a 
Wk 


VWW1.. 
i=1 


Proof As discussed earlier we may take Z to be finite of odd order. In all cases 
we rewrite the probabilities using Lemma 7.11. The Hölder formula is clear, as is 
the domination formula when u’ < 1/2. In the case u < p’/4, one observes the 
elementary inequality 


3 1 
| cos(x 0)| < 7 + 7 cos(27 6) 


and hence (by the triangle inequality) 


1 1 
(1 — u’) + u’ cos(r8)| < (1 = E) ak 5 cos(270). 


The claim then follows from the change of variables £ —> 2& (which is invertible 
when Z has odd order). 
The duplication formula similarly follows from the elementary inequality 


(1 — u) + u cos(2x0) < ((1 = =) of É cos(270)) , 


which can be seen by taking logarithms and exploiting the concavity of log(1 — t) 
in the region 0 < tf < 1. 














The above corollary allows one to show that the quantity P(X VS 0) is fairly 
stable when one tinkers with the tuple v (for instance, by adding or removing 
duplicates) and the parameter u, at least when u < 1/2. As an application, let us 
give a Fourier-analytic analog of Corollary 7.4. 


Corollary 7.13 Letv = (vı, ..., Vn) be an n-tuple in a torsion-free group Z such 
that at least k of the vj are non-zero. Then for all0 < u < 1 and x € Z we have 


P(X® =x)= 0 (=) ; 


284 7 The Littlewood—Offord problem 


Proof Using the domination property we may take u < 1/2. Without loss of gen- 
erality we may take v,,..., vg, to be non-zero. Applying Corollary 7.12 repeatedly 
we have 


P(XP = x) < P(X% = 0) 
< P(X% = 0) 
P 


for some 1 < j < k. The latter quantity is a standard quantity in the theory of ran- 
dom walks! and can be computed combinatorially using Stirling’s formula (1.52), 
but we present here a Fourier-analytic approach. We can map vi via a Freiman 
isomorphism to the identity 1 in a large cyclic group Zy, and use Lemma 7.11 to 
conclude 


P(X% = 0) = Esez (1- 5 


5 + 5 cos(2x/N)) 


and thus, on taking limits as N —> oo, 


1 k 
P(x“ =0) = / (1 SUES ee cos(2xré)) dé. 
vj 0 2 2 

Using (7.1), it suffices to bound Í exp(—O(ky7é ))dé. It is easy to show that most 
of the weight of this integral is in the interval (0, C/V uk) for some large constant 
C. The claim follows. 














We remark that in the case u = 1, Corollary 7.4 gives the sharp bound 


(2 ) 1 
dq) _ = 
PP =x) H o (=) 
thanks to Stirling’s formula (1.52). This shows that the Fourier-analytic method 
can give bounds which are sharp up to absolute constants. 

If the steps v,,..., v, are sufficiently “high-dimensional” one can do better 
than this O(1//k) type bound; see Exercise 7.2.3. 

Now let us give a deeper distributional inequality which relies in particular on 
the Cauchy—Davenport inequality (Theorem 5.4). 





Lemma 7.14 (Halász relative concentration inequality) [195] Let Z be either 
torsion-free or cyclic of odd prime order. Let v be a tuple in Z. Then for any 


1 Indeed, a useful heuristic is to think of X my as behaving (up to constants) similarly to the uniform 


distribution on the progression [~y uk, y uk] - v; note that this heuristic is supported by the 
Chernoff inequality. 


7.2 The Fourier-analytic approach 285 


O<u<wm <1 withu < 1/4, we have 


P(X% =x) < O (Erer = o) + 0(P(x = 0) ae) 
forallx € Z. 


Note that the domination inequality only gives P(X% = x) < P(X!” = 0). 
Thus Halasz’s inequality becomes superior when jz is significantly smaller than 
W, in which case it asserts that X “ concentrates at the origin substantially more 
often than X% ” does. For some further discussion and more quantitative versions 
of this inequality, see [195], [364], [365]. 


Proof Using the domination inequality we may assume that u’ < 1/2 and x = 0. 
We may also take w’/u to be large. By Corollary 5.25 we may take Z = Z, for 
some odd prime p. Introduce the functions F, G : Z > Rt by 


FE) =| [0 -w + p cosx - vj); GE) =] [0 -u+ peos(2x6 - v,)); 


j=l j=l 


then by Lemma 7.11 our task is to show that 


Ez (F)=0 (|e) + O (Ez, (G), 


Now let 0 < œ < 1 be arbitrary. Observe from (7.1) that if £ € Z, is such that 


F(&) > aq, then 
n 1/2 lo 1 
vlog a 
(x lé- T =0 
j=1 


Ji 


By the triangle inequality, we thus conclude that if &,,..., Em are arbitrary elements 
of the set {§ € Z, : F(E) = a}, then 


n r 1/2 /log + 
Soi +--+ + En) vlk) =O | m 
j=l 








/ w 
If we take m to be Leë | for some small absolute constant c > 0, another appli- 
cation of (7.1) then gives 
GE +: + En) > a. 
In other words we have established the sum set inclusion 


mé € Zp: F(E) > a} C {E € Zp: GE) > a}. 


286 7 The Littlewood—Offord problem 


Applying the Cauchy—Davenport inequality repeatedly, we have! 
Pz, (m{é € Zp : G) > a}) > max(mPz, ({§ € Zp : F(E) > @}), 1). 


If æ > Ez,(G), then Pz, ({§ € Zp : GŒ) > a}) < 1 by Markov’s inequality, 
and hence 


Pz, dé € Zp : FE) > a}) < AC € Zp : GE) > a}). 
Integrating this in œ, we conclude 
Ez,(FI(F > Ez,(G))) < ~F7,(G) =0 (=e) 
On the other hand, from (7.1) we have the pointwise bound 
FE) = GOME) 
and hence 


Ez, (FI(F < Ez,(G))) < Ez,(G)°"™. 











Adding this to the preceding inequality, we obtain the claim. 





A modification of the above argument gives a more direct bound on P(X ee x). 


Lemma 7.15 (Halász concentration inequality) /167] Let Z be a cyclic group 
of prime odd order, and lety = (vı, .. . , vn) be a tuple in Z with all the v; non-zero. 
Then for any 0 < u < 1l and x € Z we have 


PK =x)<0 (Jre E cos(é + vj) > *)) + exp(—Q(un)). 
(72) 


Proof Using the domination property we may take u < 1/2. By Lemma 7.11 
and (7.1) we have 


P(X) = x) < EzF < Ezez exp (-© Q lš » ti) 


j=l 


1 To be absolutely precise here, we should have written 
Pz, (m{& € Zp : G(E) > a}) > max(mPz, ({§ € Zp : F(E) > a}) — (m — 1)/p, 1), 


since Cauchy—Davenport inequality only implies |A + B| > min{|A| + |B] — 1, p}, for any two 
subsets A, B of Zp. However, the term (m — 1)/p is negligible as we can take p arbitrarily large. 


7.2 The Fourier-analytic approach 287 


We can subdivide the right-hand side based on the size of Oe IE -vj lla)”. 
and bound the above expression by 


A 1/2 
o| >> exp(—O(m))Pecz (Ema) < mju ||+exp—2Xcun) 


l1<m<cun j=1 
where c > 0 is a small absolute constant. Now observe that 

lé - vslligyz = OC — cos(2z€ - v;)) (7.3) 
which in conjunction with Lemma 4.5 gives 


Egezllé - vjllajz = OC). 


By linearity of expectation we thus have 


Esez J ll£ - vjllez = OM); 


j=l 


in particular, we see that Peez(((d7,-1 IIE + vj liga)” < c./n)) is strictly less than 
one if c is small enough. Applying the Cauchy—Davenport inequality as in the 
preceding proof, we conclude 


h 1/2 
Prez (5 lé- "lin < V/m/u 
j=l 


a 1/2 
[fm 
<0 ( =) Prez (È IG vka) < cyn 
j=l 


Using (7.3) again, we conclude 


n 1/2 n 
Prez (So tka) < ymju < (JZ) pez (Zwe w= 3) 
I= P ae 


if c is sufficiently small. The claim then follows from the observation that 


m 1 
E) —=0 
ee D ( <a) 


(the geometric decay of exp(—©(m)) being more than sufficient to counteract the 
polynomial growth of ./m). 

















This bound easily implies Corollary 7.13, and is in fact significantly stronger. 
For instance, we have 


288 7 The Littlewood—Offord problem 


Corollary 7.16 [167] Let0 < u < 1, and letn be sufficiently large depending on 
u. Let y = (vi, ..., Un) be a tuple of positive integers. For each integer j > O, let 
mj; denote the number of times j occurs in v, thus mj := {1 <i <n: v =j}. 
Then for any x € Z we have 


P(x) =x)<0 (enea Ymi) , 
j>0 
In particular, if all the v; are distinct, then 
P(X” =x) < O(n’). 


We remark that in the u = 1 case, the second half of this Corollary was first 
established by combinatorial means in [310] (with the precise threshold given in 
[330]). 


Proof We may use a Freiman isomorphism to place v1, ..., v, inside Z, for 
some very large prime p. A direct application of Parseval’s theorem 4.2 gives 


2 
X cos sY) =O > ni) 


j=l j>0 


E; eZ, 








and hence by Markov’s inequality 


‘ 1 
Prez, (x cos(& - vj) > 4 =O (> Ymi) . 


j=l j>0 
The claim then follows from Lemma 7.15 (observing that exp(—O(un)) = 


O(u-'/2n->/7) when n is large). 














Exercises 


7.2.1 Show that in the condition u < y’/4 in the domination inequality of 
Corollary 7.12, the constant 4 cannot be replaced by any smaller constant, 
even in the most important case u = 1. 

7.2.22 Ifv=(v,..., v,) area tuple of integers, show that 


1 n 
P(x”? = m) = Í cos(27mé) [[q — u + pwcos(2r vj;€)) dE 
0 j=! 
for all integers m. 
7.2.3 [167] Let1 < k <nandd > l,andletv = (v,,..., Vn), a tuple of vectors 
in Rf, be “non-degenerate” in the sense that every proper subspace of R? 
contains at most n — k of the vj, ..., v,. Show that 


P(XP = x) = Oa (Cuky ) 


7.2.4 


7.2.5 


7.2.6 


7.2.7 


7.2.8 


7.2 The Fourier-analytic approach 289 


for every 0 < u < 1 and x € R. (Hint: argue as Corollary 7.13, start- 
ing with an expression such as Bee = 0) and applying Hélder’s 


inequality suitably to arrive at a quantity such as P(X s 2 yi)» Where 
kwh wk 


w1, ..., Wa € Rf are linearly independent.) Give examples that show this 
bound is best possible up to the implicit constants in the O40) notation. 
[364] With the notation and assumptions of Lemma 7.14, establish the 
following quantitative special case of the Halász inequality: 


1 4 
D = p(x/16) — (1/16) — 
P(XP =x) < SP(X, 0) +P(X$ 0)". 


Show that Lemma 7.14 can fail when Z is a non-cyclic finite group. In 
particular, if Z = Fé , Show that P(x! = 0) can be comparable to 1/34 
for a large range of u if the tuple v is chosen appropriately. This shows 
the pivotal role played by the Cauchy—Davenport inequality in the Halász 
argument. 

Show that if the m; are decreasing in j, then the right-hand side of 
Corollary 7.16 cannot be improved except for the implicit constant. (Hint: 
compute the variance of X w) 

Let 0 < u < 1, and suppose n is sufficiently large depending on u. Let 
v = (vı, . . . , Vn) take values in an additive set S in Z „ for some odd prime 
p. Show that for any even integer k > 2 and x € Z we have 


k 
P(X = x) < Or | un ISIA (z ni) 
jes 


where m ; is the number of times j occurs in v, and the A(2k) constant is 
defined in Definition 4.26. In particular, if the v; are all distinct, then 


P(XP = x) < O(n.) 


Thus X% can only concentrate significantly when the A(p) constants of 
the support of v are large. 

[167] Let O < u < 1, and let n be sufficiently large depending on p. 
Let v1, ..., Un be non-zero integers, and let k > 2 be an even integer. 
Generalize Corollary 7.16 to show that for any x € Z we have 


P(X P = x) < Ok (u~n? Ry) 
where R; is the number of solutions to the equation 


€1U;, + +++ + €2xVi, = 0 


290 7 The Littlewood—Offord problem 


where €;,..., €x E {-1, +l} andi, ..., iox E€ [1, n]. In particular, if the 
vj are all distinct and take values in a set S, then we have 


P(X = x) < O(u"?n PECS, 5)). 


Thus X” can only concentrate significantly when the support has sub- 
stantial additive energy. Explain heuristically why this result is related to 
the u = 2k/n case of Lemma 7.14. 


7.3 The Esséen concentration inequality 


In several applications, we are not interested in the probability that a random walk 
XW ends up in a specified point, but rather in a region of space such as a cube. In 
some “discrete” cases (e.g. when the vj, ..., v, live in a lattice) one can simply 
use the union bound to pass from the former to the latter, but this is not always 
the best approach. One useful tool for dealing with concentration in general is a 
simple concentration inequality of Esséen. 


Lemma 7.17 (Esséen concentration inequality) [{10/] Let X be a random vari- 
able taking a finite number of values in R!. Let xo € R4, and let R, € > 0. Then 


d 
R vd | 
= E(e(é - X))| dé. 
(= a) ea (e(& - X))| dé 


Here e(x) := exp(2mix), £- X denotes the usual inner product on R, and || 


sup P(|X — xo] < R) =O 


xR? 


denotes the usual magnitude. 


Proof By rescaling X and R by ¢ we may take £ = Jd. A simple covering 
argument (using for instance Corollary 3.15) then shows that it suffices to show 
that 


P(X — xol < c/d) < oa f |E(e(§ - X))| dé 
EERI: <vd 
for all x9 € R and some small absolute constant c > 0. By translating X by xo 
(which does not affect the right-hand side) we may take x9 = 0. Now from the 
standard Gaussian integral identity 


f en TCD ere X) dE = C742 9-2 1XP/2 
EcR? 
for any C > 0, we see that 


f eC e(é . X) dél = Q(1)4 (7.4) 
EER E|<J/d/2 


7.3 The Esséen concentration inequality 291 
whenever |X| < c/d, if c is chosen sufficiently small and C chosen sufficiently 
large. Squaring this we obtain 


ih elt - X)w(E) dé = UQD KIXI < cvd) 
GERM |E|<Vd 


where w(£) := Jenei e77 Cil? e=1ClE—41l. Taking expectations of both 
sides we obtain 


I |E(e(E - X)|wWE) dE > QIPUXI < cvd). 
BERN IE|<Vd 





From (3.8) we see that w(€) = O(1)%, and the claim follows. 











Applying this in particular to the random variable X “ for some v= 
(vi, ..., U,) and 0 < u < 1 we obtain the following analog of Lemma 7.11: 


d 
vd R i 
P(|X — xo| < R) = Of — + — / |l—wtpcos(2ré - v;)| dé. 
( | ) ( E Jd Anell q 


(7.5) 
As an application we present a higher-dimensional analog of Corollary 7.10, 
but with the loss of a dimension-dependent constant. 


Proposition 7.18 /207], [167] Let0 < u < 1, and suppose n is sufficiently large 
depending on u. Let v1, ..., Vn be elements of R! with |v;| > 1 for alli. Then for 
any xo € Rf, we have 


k 
w) _ , 
P(|XW — xo| < k) < OC) Tia 


forallk > 1. 


It is worth noting that the right-hand side grows only linearly in k, instead of the 
kf type growth that one might naively expect. This is a reflection of the heuristic 
that the random variable X% tends to concentrate the strongest on one-dimensional 
spaces (cf. Exercise 7.2.3). 


Proof In view of (7.5) (with R = k and e = 1/k), it suffices to show that 


n 1 d k 
|1 = w+ weosne : v) de = 0 (=) —. 
ol i kyd) „yun 


Applying Hölder’s inequality, we reduce to showing that 


ENSK 
|1 — u + u cos(2xé - v;)|” d& = O (=) 
oe i k/d) Jmn 





292 7 The Littlewood—Offord problem 


for each 1 < j < n. We can estimate 


|1 — w+ wecos(2srg - vj)| < exp (— 2(ulI28 - vjllkyz)) 


(cf. (7.1)) and then make the change of variables t = 2 - v; (using (3.8) to estimate 
the volume of the d — 1-dimensional balls that are integrated out) to reduce to 
showing the one-dimensional estimate 


1 
-2 tlžz))dt = O ( ) l 
lvl Jeev; exp ( (unlitiR;z)) Ta 


Subdividing the f variable into unit intervals and using the periodicity of ||7||R/z and 
the hypothesis |v;| > 1, the claim then follows from the easily verified estimate 








is exp(—Q(un|t|*)) dt = O (=) . 














One can similarly develop analogs of many of the results of the preceding 
section, though the analysis is a little more technical as the analogs of Corollary 7.12 
are somewhat messier. See [167] for further development of this theory. 


Exercises 


7.3.1 Prove (7.4). 
7.3.2 Establish the following dimension-independent analog of the Esséen 
concentration inequality: 


sup P(e**™") = Í JE(e( - Xle? dg. 

xoeR4 EERI 

7.3.3 [367] Obtain an analog of Exercise 7.2.3 for the probability P(XẸ € B) 
for some unit ball B, assuming that, for every proper subspace of R”, at 
most n — k of the vectors lie within a unit distance of this subspace. 

7.3.4 Use the previous exercise to develop an analog of Erdős, results in any 
dimension [108, 367]. 


7.4 Inverse Littlewood—Offord results 


In the preceding sections we considered direct Littlewood—Offord results, in 
which some assumptions were made on the steps v = (v1, ..., Vn), and as a con- 
clusion some upper bounds were obtained for concentration probabilities such 
as P(x“ = x). In many applications it is of more interest to establish inverse 
Littlewood—Offord results, in which a lower bound on a concentration probability 


7.4 Inverse Littlewood—Offord results 293 


is assumed, and some structural property of v is deduced as a consequence. Of 
course, every direct Littlewood—Offord result can be converted into an inverse by 
taking contrapositives. For instance, from Corollary 7.13 we know thatif v1, ..., Un 
live in a torsion-free group Z and 


1 
P(x) = x) > — 
( )2 Te 
for some 0 < u < 1 and some x € Z, then at most O(k) of the steps v1, ..., Un 
are non-zero. Similarly, from Corollary 7.16, we see that if vj, ..., Vn are positive 


integers and P(X? = x) is much larger than x~!/2n~*/? for some 0 < u < 1 and 


x € Z, then at least two of the vj are equal (in fact one can easily establish that a 
large number of pairs (v;, vj) must be equal). 

Now we consider inverse Littlewood—Offord theorems that give more structure 
on the steps v1, ..., Vn. The results in this section can be viewed in analogy with 
inverse sum set estimates, in which one assumes that a certain set A has small 
doubling constant and concludes some structural information on A, for instance 
containing A inside a progression. For simplicity we shall focus on the case u = 1 
(though one can use results such as Corollary 7.12 or Lemma 7.14 to then extend 
to more general u). 

Let us start with an example when max, P(X! = x) is large. This example has 
been the main motivation of our results. 


Example 7.19 Let P be a symmetric generalized arithmetic progression of (con- 
stant) rank d and volume V in Z. Let v,,..., uv, be (not necessarily different) 
elements of V. Then the sum )~7_, €;v; takes values in the generalized arithmetic 
progression n P which have volume n“V. From the pigeonhole principle it follows 
that 


max P(X} =x) =n4v7". (7.6) 


The above example shows that if the elements of v belong to a generalized 
arithmetic progression with small rank and small volume then P,,(v) is large. One 
might hope that the inverse of this also holds, namely, 

If P (v) is large, then the elements of v belong to a generalized arithmetic 
progression with small rank and small volume. 

We are going to present a few results which support this statement. Let us first 
give a simple, but rather weak, result. 


Proposition 7.20 Let v = (vi, ..., Vn) be a tuple in an additive group Z which 
is either torsion-free or finite of odd order, such that P(X‘) = x) > 2-4"! for 
some x € Z and d > 0. Then all the steps v1,...,U, are contained in a cube 
[—1, 1]? - (w1, ..., wa) of dimension d. 


294 7 The Littlewood—Offord problem 


Proof Suppose the conclusion failed. Then from Lemma 4.35 we see that v must 
contain a dissociated subword w = (w1, . .., Wa+1) of length d + 1. By condition- 
ing on the variables not associated to w, we observe that 


ae P(XP = x) < sup P(X = y). 
yeZ 


On the other hand, since w is dissociated, and Z has no 2-torsion, all the sums in X. D 
are distinct and so P(X“) = y) < 27%-!, thus yielding the desired contradiction. 














In practice, this proposition is not very useful because the dimension d of 
the cube can be rather large (typically it is like log n). However, one can lower 
dimension its by increasing the side lengths, and allowing some exceptional steps 
v; to lie outside of the resulting progression. 


Proposition 7.21 Let Z be either torsion-free or finite of odd order. For any integer 
d > 1, there is a positive constant ôq such that the following holds. Let k > 2 be 
an integer, let x € Z, and let v = (vı, ..., Un) be a tuple in Z. Then either 


P(XP = x) < ôk? 
or there exists a progression P = [—k, kK]&! . (wy, ..., wa_1) in Z such that for 


all but at most k? exceptional values of j € [1, n], there exists ay € [1, k] such that 
avj € P. 


Note that Corollary 7.13 (with u = 1) can be thought of as the d = 1 case of 
this proposition, while Proposition 7.20 can be viewed as the limiting case k = 1. 
Of course one should take k < „/n to avoid the claim being vacuous. 


Proof Call a tuple (w1,..., w,) k-dissociated if the progression [—k, k]’ - 
(w1, ..., Wr) is proper. We now construct an k-dissociated tuple (w1, ..., w») 
for some 0 < r < d by the following algorithm. 


e Step 0. Initialize r = 0. In particular, (w1, . . ., w,) is trivially k-dissociated, 
and from Corollary 7.12 we have 
(1/4d) 1 
È O = 0) = P(Xy) = 4). 7.7) 
e Step 1. Count how many | < j < n there are such that (w;,..., w,, vj) is 


k-dissociated. If this number is less than k?, halt the algorithm. Otherwise, 
move on to Step 2. 

e Step 2. Applying Corollary 7.12, we can locate a v; such that (w1, ..., wr, vj) 
is k-dissociated, and 


(1/4d) ot (1/4d) = 
i cca ee 7 0) =a (83 we T 0) l 
K w ok 


7.4 Inverse Littlewood—Offord results 295 


We then set w,41 := vj and increase r tor + 1. Return to Step 1. Note that 
(w1, ..., W,) remains k-dissociated, and (7.7) remains true, when doing so. 


Suppose that we terminate at some step r < d — 1. Then we have an r-tuple 
(w,,..., w,) which is k-dissociated, but such that (w1, ..., w,, vj) is k-dissociated 
for at most k? values of v j- Unwinding the definitions, this shows that for all but 
at most k? values of vj, there exists ag € [1, k] such that aov; € Q — Q, where 
Q := [0, k] - (wy, ..., w,) andr < d — 1. The claim then follows by adding some 
dummy vectors to the wj. 

Now we prove that we must indeed terminate at some step r < d — 1. Assume 
(for a contradiction) that we have reached step d. Then we have an k-dissociated 
tuple (w1, ..., wg) such that 


P(x = = x) < <P i, =0). 


= 
Kwh 
Let T C Zf be the lattice 
I := {(m,...,ma) € Z4 : mwi +-+ + mawa = 0}, 


then by using independence we can write 


d 
(1/4d) (1/4d) 
PaP = sehat) E rn) as 


(m,....mg)EV j=1 
where Ins = n/a Ge pat iid), 

Now We use a volume- Ati argument. A simple computation involving 
the binomial formula (or induction on the k? parameter) shows that the expres- 
sion Pa =m) is even in m, and decreasing for positive m. It is also 
©4(1/k) when |m| < k (this can be seen either from Stirling’s formula (1.52), 
or from Corollary 7.13 and variance and monotonicity considerations). Thus we 
have 


1 
p(x =n) E P(e =m) 
m'em+(—k/2,k/2) 


and hence from (7.8) we have 


poses j=l 


d 
z 1/4d 
P(x? =x) < o| Y 5 JIP (2% =m;) |- 
(my,.-..ma EV (m) ,...,m EC yma) (—Kk/2,k/2)4 


296 7 The Littlewood—Offord problem 


Since (w1,..., w4) is k-dissociated, all the (m/,..., m4) tuples in [+ 
(—k/2, k/2)¢ are different. Thus, we conclude 


d 
PP = 3) < 04 (« D e= m)) 


(m,,....mg)EZ4 j=l 
But from the union bound we have 
d /4d 
(1/4d) 
[pe (xi? =m) <1 
(m,....mq)€Z4 j=1 


To complete the proof, set the constant ôq in the proposition to be larger than the 
hidden constant in Og(k~“). 














The do factor in the above proposition is somewhat undesirable. With some 
more effort, one can remove this factor, but at the cost of enlarging the progression 
somewhat. 


Theorem 7.22 (Inverse Littlewood—Offord theorem) /366] Let 0< wu <1 
and let a and A be arbitrary positive constants. Then there is a constant 
B = B(u,a, A) such that the following holds. Assume that v = (v1, ..., Un) is 
a tuple of rational numbers satisfying max, P(X¥ = x) > n^. Then there is a 
generalized arithmetic progression P of rational numbers of rank at most B and 
volume at most n? which contains all but at most Bn® elements of v. 


The proof of Theorem 7.22 is somewhat lengthy but is a modification of that 
of Proposition 7.21. For details see [366]. 

An inverse theorem in a similar spirit for the relative Halász inequality, 
Lemma 7.14, was also obtained in [365]: 


Theorem 7.23 (Inverse Halász inequality) [365] Let Z be either torsion-free 
or cyclic of odd prime order. Let v = (vı, ..., vn) be a tuple in Z, and suppose 
that £o > £, > O are such that 


P (AY = 0) > &P (XE ey = 0) 


and 
xX) — 3 i 
P( y =0)> q + 260 " 


Then there exists a proper progression P of rank Oss (1) and volume 


1 ; P 
Osos Sq) which contain the v1, ..., Un. 


In fact some additional structural information was obtained, namely that 
the vi,..., Un are mostly contained in the “core” of the progression P, and 


7.5 Random Bernoulli matrices 297 


under certain “non-triviality” assumptions on v (basically, that the set of signs 
(m1,---5%) E {—1, 1}" for which nivi +--+ + Uv, = 0 has to span the hyper- 
plane) one can also place the v; in an arithmetic progression of length n°”. 
For more precise statements and proofs see [365]. The main point is to inspect 
the use of the Cauchy—Davenport inequality in the proof of Lemma 7.14, and 
observe that this inequality is only efficient when sets suchas {§ € Z, : F(€) > a} 
have small doubling constant. This in turn can be used (via some duality argu- 
ments) to place the v,,..., vu, in a “Bohr set” of small doubling constant, at 
which point one can apply a Freiman-type theorem (e.g. Theorem 5.44) to place 
the v; in a progression. This result played an essential role in establishing the 
bound P(det(M,,) = 0) = G + o(1))” for n x n random Bernoulli matrices; see 
Section 7.5 for further discussion. 


Exercise 


7.4.1 Let the notation and hypotheses be as in Proposition 7.21, and let 1 < 
m < k. Show that either 


P(X? = x) = Og(mk~“”’) 


or there exists a progression P = [—k, k]! . (wy, ..., wa_1) in Z such 
that for all but at most k? exceptional values of j € [1, n], there exist at 
least k/m values ao € [1, k] such that agu; € P. (Hint: argue as in Propo- 
sition 7.21, but work with k /2-dissociated tuples instead of k-dissociated 
ones, and add one extra copy of v in (7.7). Then if the latter conclusion 
fails, use Corollary 7.12 one final time to exploit the sparseness of the ag 
for which aov; € P and thence obtain the former conclusion.) 


7.5 Random Bernoulli matrices 


Let M, be the random n x n matrix whose entries are independent uniformly dis- 
tributed signs +1 (M, is often referred to as the random Bernoulli matrix). The 
distribution of several quantities relating to M,,, such as its determinant and singular 





values, is of interest to a number of fields, including theoretical physics, combi- 
natorics and theoretical computer science. It turns out that the tools developed in 
earlier sections are very well adapted for the study of M,. 

In this section we focus on a specific problem, namely to understand the singu- 
larity probability P(det(M,,) = 0). An equivalent formulation is: given n vectors 
X1, ..., Xn chosen uniformly at random from the unit cube {—1, 1}” € R”, what 
is the probability that these vectors are linearly independent? 


298 7 The Littlewood—Offord problem 


This simple-sounding problem has turned out to be surprisingly non-trivial. It 
is easy enough to show that 








P(X; = +X; for some 1 <i < j <n and sign +) = (1+ o(1))n?2™”. (7.9) 
A similar argument (taking into account both the rows and columns of M,,) 
gives 
P(det(M,,) = 0) > (2+ 0(1))n?2™". (7.10) 
It is conjectured that this is sharp; thus 


Conjecture 7.24 P(det(M,,) = 0) = (2 + o(1))n?2™. In particular, P(det(M,,) = 
0) =( + 0(1))”. 

This conjecture remains open, although we will discuss some progress on this 
problem in this section. Notice that M,, is singular if and only there is a non-zero 
vector v € R” such that M,,v = 0. By restricting v to some special sets of vectors, 
we can obtain the conjectured bound (1/2 + o(1))”. The following result is due to 
Komlós. 


Theorem 7.25 Let n > 3, and let Qı be the set of vectors in R" with at least 
3n/ log, n coordinates. The probability that M,v = 0 for some non-zero v € Qı 
is (1 + o(1))n?2™". 


By considering the transpose of M,,, one can see that this theorem is equivalent 
to the following lemma. 


Lemma 7.26 Letn > 3, and let E denote the event that a1 Xı + ->+ an X, =0 
for some non-zero (a), ..., an) € Qı. Then P(E) = (1 + o(1))n?2™". 


Proof To establish the upper bound, we use the union bound to give 


PE)= J PCE \ Ex -1) 


2<k<n—3n/ logy n 


where Eg is the event that a;X; +---+a,X, = 0 for some (qj,..., an) € R” 
with exactly k of the a; being non-zero. (Note that the event F) is vacuous.) From 
(7.9) we easily see that P(E) = (1 + o(1))n?2~", so it will suffice to show that 


yo. SP Ee) S02), 


3<k<n—3n/ log, n 


From symmetry we have P(E;) < (7) PCA Ex—1)s where F; is the event that 


a,X,+---+a,X, = 0 for some non-zero a1, ..., ay. If Fy\ Ex—1 occurs, then 
the n x k matrix whose columns are X1, ..., Xx has rank exactly k — 1, and so 
(a1, ..., ay) is essentially the wedge product of k — 1 of the rows of this matrix. 


There are Ce) ways to choose these rows, and then, on fixing all the entries of 


7.5 Random Bernoulli matrices 299 


those rows (and hence fixing a1, . . . , ax), we see from Corollary 7.4 that each of the 
other n — k + 1 rows will be consistent with the equation a; X; + -+-+ aX; = 0 
with probability (| as id /2*. We conclude that 


7 hi k n—k+1 
PEEL = Ce") (rad) 
ee f e aa logy n k k—1 Lk/2] 


The claim then follows by direct computation (estimating (ies i) /2 by OU/./n) 
when k = ©(n)). 














Let us consider another restricted class. Let Q2 be the set of integer vectors 
in R” where the coordinates have absolute values at most n©, for some positive 
constant C. 


Theorem 7.27 The probability that M,v =0 for some non-zero v € Q) is 
(1/2 + o(1))”. (The error term o(1) depends of course on C.) 


Proof The lower bound is trivial so we focus on the upper. For each non-zero 
vector v, let p(v) be the probability that X - v = 0, where X is a random Bernoulli 
vector. It is trivial that P(M„v = 0) = p(v)”. Since a hyperplane can contain at 
most 2”7! +1 vectors, p(v) is at most 1/2. For j = 1, 2,... let S; be the number 
of non-zero vectors v in Q2 such that 27/7! < p(v) < 2~/. Then the probability 
that M,,v = 0 for some non-zero v € Q2 is at most 


sey Sj. 
j=l 


Let us now restrict the range of j. Notice that if p(v) > n~!/3, then by Corollary 7.4 
most of the coordinates of v are zero and then by Theorem 7.25 the contribution 
from these v is at most (1/2 + o(1))”. Next, since the number of vectors in 9% is 
at most (2n© + 1)" < n‘©+)", we can ignore those j where 2~/ < n7072, Now it 





suffices to show 


>> YPS = 0((1/2)"). 
n-C-2<2-j <n-"/3 
Let € be a small positive constant (say .001). As we have j = O(log n) for all 
relevant j, we can find an integer d = O(1) such that 


n78- 141/3) —(d+1/3)e 


>27 >n 

(The value of d depends on j, but is bounded from above by a constant.) Set 
k =n‘. Thus 2~/ >> k~@ and we can use Proposition 7.21 to estimate S j. Indeed, 
by invoking this theorem, we see that there are at most (/3)(2n© + DP = nl) = 
n°) ways to choose the positions and values of exceptional coordinates of v. There 


300 7 The Littlewood—Offord problem 


are only (2n© + 1)4—! = n° ways to fix the generalized progression P. Once 
P is fixed, the number of ways to set the rest of the coordinates of v is at most 
|P|" = (2k + 1)¢-", Putting these together, 


Sj < O'n em, 
Since k = n? and 27} < n—@—!+1/9€_ it follows that 
270" S; < O01) nn, 


As the number of js is only O(log n), and n— 2) logn = o((1/2)"), we are done. 














By combining Theorem 7.25 with Corollary 7.13, we have the following 
consequence. 


Corollary 7.28 [215] Letn > 3. Then for any 1 < i < n we have 


P(X; is a linear combination of X,,..., Xi—1) < min Gaz O (=)) A 
Ja 

Proof Let us first prove the upper bound of 2'~"—!. Note that X4, ..., X;—1 span 
a space of dimension at most i — 1, and so there exist i — 1 coordinates which 
determine all the other coordinates of the space. But if one fixes i — 1 coordinates 
of X;_-; then X;_, is still uniformly distributed among grit remaining points, 
and the claim follows. Now we prove the bound of Ol). We may assume n is 
large and i is close to n (say i > .9n). The vectors X1, ..., X;—1 will be contained 
in at least one hyperplane {(x1, ..., Xn) € R” : axı +--+ + anXn = 0}; choose 
one arbitrarily. By Corollary 7.25, we certainly will have ©(n) of the coordinates 
non-zero with probability 1 — O(-) (in fact, we can have much higher probabil- 
ity here). By Corollary 7.13, the probability that X; - (a1, ...,a,) = 0 is at most 
Ol). Since this event is necessary in order for X; to be a linear combination of 
X,,..., Xi—1, the claim follows. 














From this corollary, Bayes’ identity, and independence, one easily verifies that 


P(det(M,,) = 0) < 5 P(X; is a linear combination of X4, ..., X;—1) 
i=2 


1 
=, ogn 
Jn 
for large n. This bound was sharpened slightly to O( zp in [215], [216] by a variant 
of this method. 





7.5 Random Bernoulli matrices 301 


Using arefinement of this argument, one can infact obtain the following estimate 
for the determinant [364] 


P(| det(M,,)| = V/n! exp (O (n? log!” n))) = 1 — o(1). 


The right-hand side is nearly optimal (see Exercises 7.5.3 and 7.5.4). With the 
help of recent results from [366], one can have o(1) = 1/n° for any fix C, at the 
cost of changing the hidden constant in the O on the left-hand side. It is not clear, 
however, that one can have o(1) = exp(—Q(n)). 

Now let us present a breakthrough result of Kahn, Komlós, and Szemerédi 
[195], which established an exponential bound without any restriction. 


Theorem 7.29 [195] There is a positive constant £ such that P(det(M,) = 0) < 
(1 —e)". 


In fact the explicit value £ = 0.001 was obtained in [195]. This was improved 
to roughly e = 0.042 in [364], and then to £ = 7 + o(1) in [365]. Conjecture 7.24 
asserts that one can take £ = 5 + o(1), which would be best possible. 

We now sketch the proof of Theorem 7.29. It is convenient to rephrase the 
problem using the following lemma: 


Lemma 7.30 [195],[374],[364] We have 
P(det(M,,) = 0) = 2° P(X, ..., Xn span a hyperplane). 
Proof We already know that 
P(det(M,,) = 0) = P(X, ..., Xn linearly dependent). 


Thus the lower bound is obvious, and we need only to establish the upper. If 


X1, ..., Xn are linearly dependent, then there must exist 0 < d < n — 1 such that 
X1,-.--, Xa+1 span a d-dimensional subspace. Fixing d and conditioning on this 
event, we see from repeated application of Corollary 7.28 that X,,..., Xn will 














span a hyperplane with probability 27°™. The claim follows. 
Using this lemma followed by the union bound, it thus suffices to show 


SP Oise Xn span V) < (1 — € +0(1))" 
V 


where V ranges over all hyperplanes. Note that we can restrict our attention to the 
hyperplanes V which are spanned by their intersection with {—1, 1}"; it is easy to 
see that this is a finite set. Let us call such hyperplanes non-trivial. An important 
quantity associated to a non-trivial hyperplane is its density 


IV A {-1, 1}"| 
I{—1, 1} 





P(X € V) = 


302 7 The Littlewood—Offord problem 


where we think of X as a random element of {—1, 1}”. Note that P(X € V) = 
P(X D = 0) whenever v is a normal vector to V. We can exclude the contribution 
of all the hyperplanes of low density by the following lemma: 


Lemma 7.31 /195] For any 0 <a < 1, we have 


ys, P(X),..., Xn span V) < na. 

V:P(XEV)<a@ 
Proof If X,,..., X, span the hyperplane V, then there exists 1 <i < n such 
that the n — 1 vectors formed by omitting X; from X,,..., X, still span V. Fixing 
i and conditioning on this event, we see that V is determined by all the vectors 
other than X;, and then X; has a probability of at most «œ of also lying in V. The 
claim follows. 














Thus to establish the claim, it suffices to consider only the high-density hyper- 
planes for which P(X e€ V) > (1 — £)”. On the other hand, from Lemma 7.26 
and Corollary 7.13 we can control the extremely high-density hyperplanes for 
which P(X € V) > J So in fact we only need to deal with the range where 
(= e)" < P(X € V) < O(4). 

We now crucially exploit the relative Halász inequality, Lemma 7.14. Let 0 < 
u < 1 be a small parameter (independent of n), and let Y € {—1, 0, 1}” be the 
random variable Y = (n\" , Etat nt », Lemma 7.14 implies (if n is large enough) 
that Y concentrates on the above hyperplanes V more strongly than X does, if u 
is sufficiently small: 


P(Y € V) = O(VYWP(X € V). (7.11) 
If we use the informal heuristic 
P(X,,..., X, span V) © P(X € V) 
then we thus expect 
P(X1,..., X, span V) < O(VWP(Y, ..., Y, span V) 


where Y;,..., Y„ are identical independent copies of Y. Summing this in V, and 
using the trivial fact that each Y4, ..., Y, can span at most one hyperplane V we 
thus expect 


P(M, = 0) < O/H)" 


which certainly gives Theorem 7.29 by setting u small enough. 

The above strategy almost works, except for a slight problem in that the 
Y,,..., Yn may be so linearly dependent that they will only span a subspace of V 
rather than V itself. The simplest way to solve this problem is to use only a small 


7.5 Random Bernoulli matrices 303 


number of Y, say Y,,..., Ys, for some small! 5. If V is sufficiently high-density 
and ô is small enough, we can ensure that Y;,..., Ys, will remain linearly inde- 
pendent in V. This reduces the potential gain in this argument from O(m)” to 
only Oym)”, but this is still enough to establish Theorem 7.29. 

More rigorously, we introduce Y4, ..., Ys, independently of X1, ..., Xn. Fix a 
density (1 — £) < o < Olp) and let V be such that P(X € V) = (1 + O(4))o: 


ôn 
Pi, ..., Yn EV) = 2 (=) ae 
yu 


If ô is sufficiently small depending on u, and e is sufficiently small depending on 
ô and u, then one can modify Corollary 7.28 to refine this to 


1 ôn 
P(%),..., Ysn linearly dependent in V) > Q (=) o’; (7.12) 
Ji 


we leave this as an exercise. From independence we thus have 
P(X,,..., Xn span V) < O(/)"o"P(Ey) 


where Ey is the event that X;,..., X, span V and Y;,..., Ys, are linearly inde- 
pendentin V. Butif this event occurs, then there existn — ôn vectorsin X1,..., Xn 
which, together with Y;,..., Ysn, span V. If we fix all these vectors then V is also 
fixed, and the remaining ôn vectors in X1,..., Xn have a probability of O(0°") of 
lying in V. We thus conclude that 


P(Ey) < ( : evo 
ôn 


V:P(XeV)=(1+0(1))o 


which, when combined with the preceding estimates, give 


< P(X,,..., Xn span V) < on (? i 
V:P(XeEV)=(1+0(2))o n 


If we choose ô sufficiently small depending on jz, and e sufficiently small depend- 
ing on ô, u, we can make the right-hand side (1 — ¢ + o(1))”. Summing over all 
relevant o (there are only about O(n’) such o to sum over) we obtain Theorem 7.29 
as desired. 

By using Theorem 7.23 one can boost € to be as large as i + o(1). The basic 
point is that Theorem 7.23 allows one to improve (7.11) significantly unless the 
hyperplane V has an exceptional form (in particular, the coordinates of its normal 


1 Strictly speaking we should use [dn] instead of ôn but we shall omit this inessential detail for ease 
of exposition. 


304 7 The Littlewood—Offord problem 


vector lie in a fairly small generalized progression). These exceptional hyperplanes 
however are rather rare and can be treated by a direct counting argument. 

Let us conclude by a refinement of Theorem 7.29, which allows us to fix a few 
rows of M,,. Let Y be a set of l independent vectors y1, ..., yı and denote by MY 
the random matrix with rows X1,..., Xn-1, Yi,---, Yı, where X; are i.i.d copies 
of the random Bernoulli vector X. 


Theorem 7.32 [366] For any non-negative integer I, there is a positive constant 
e such that the probability that MY is singular is at most (1 — £)". 


Exercises 


7.5.1 Prove (7.9) and (7.10). 

7.5.2 Prove (7.12). 

7.5.3 Show that det(M,,) € 2"! - Z and | det(M,,)| < n”’ for all Bernoulli 
matrices M,,. 

7.5.4 Show that det(M,) has expectation zero and variance n!, and | det(M,,)|? 
has expectation n! and variance n(n !)*. Derive a upper bound for |6(M,,)|. 
(For a matching lower bound, see [364].) 

7.5.5 [195] Show that sup, cp P(det(M,) = x) = (1 — £ + 0(1))” for some 
absolute £ > 0. 

7.5.6 [195] Show that for any £ > 0 we have 

5 P(X,,..., X, lie in V) = (0,-59(1))". 
(1—2) <P(XEV)<O (J) 

7.5.7 [195] Show that there exists an absolute constant C > O such that 
P(X), ..., Xn-c dependent) = G +o(1))” whenever n is sufficiently 
large depending on £, C. Conclude in particular that the probability that 
M, has rank n — C or less is G + o(1))”. 


7.6 The quadratic Littlewood—Offord problem 


The preceding sections studied the concentration of linear combinations of random 
variables such as 7,0; +--+ + Up. It is also of interest to study more general 
polynomial combinations. For simplicity we shall restrict ourselves to the quadratic 
expression 


Qni,- Mn) = > Ci jninj + Xdin 
i 


l<i<j<n i= 


where c;,;, d; take values in an additive group Z, and nı, ..., Nn are independent 
uniformly distributed random +1 signs. 





7.6 The quadratic Littlewood—Offord problem 305 


One can now ask under what conditions one can establish upper bounds on 
the concentration of the random variable Q. In the special case when the c;; are 
identically zero, we know from Corollary 7.13 that Q will not concentrate at a 
single point as soon as many of the d; are non-zero. One can then hope to establish 
a similar result for the quadratic component, namely that Q will not concentrate 
at a single point as soon as many of the c;; are non-zero. We give a sample result 
of this form as follows: 


Proposition 7.33 [64] Let Z be either torsion-free or finite of odd order. Let the 
notation be as above, and suppose that for at least k values ofi, we have c;,; # Ofor 
at least l values of j. Then for any x € Z we have P(Q = x) = O(min(k, D718). 


Proof Without loss of generality we may take k < l. A greedy algorithm argu- 
ment shows that we can find a set A C [1, n] of cardinality | (k + 1)/2], such that 
for each i € A we have c;,; Æ 0 for at least |(/ + 1)/2] values of j € [1,n]\A. 
The basic idea is to view the quadratic object Q as a linear expression $- j Xini 
where the X; are themselves linear expressions of 7),..., nn, so that one can 
obtain a quadratic non-concentration result from two applications of the linear 
non-concentration result. However there is a “coupling” problem, arising from the 
fact that the X ; and 7; do not behave independently. This however can be resolved 
via the following decoupling inequality 


P(E(X, Y)) < P(E(X, Y) A E(X, Y)? 
< P(E(X, Y) A E(X, Y^) ^ E(X', Y) A E(X', Y'))'4 (7.13) 

whenever X, Y, X’, Y’ are independent random variables taking finitely many val- 
ues, with X, X’ having the same distribution and Y, Y’ having the same distribution, 
and E(X, Y) is any event depending only on X and Y. The proof of this inequality 
follows from two applications of the Cauchy—Schwarz inequality and is left as 
an exercise. We apply this inequality with X := (n;)jeq and Y := (nj) je{in|\a> 
writing Q as Q(X, Y), to obtain 

P(O(X, Y) = x) < P(O(X, Y) = Q(X, Y^) = QOX’, Y) = OXY’) = x)" 
where X’ = (ni, -> Maja) and Y’ = n1 ..., N4) are identical independent 
copies of X and Y. In particular we have 

P(O(X, Y) = x) < P(O(X, Y) — Q(X, Y’) — O(X', Y) + OXY) = 0)". 
On the other hand, we have the factorization 

Q(X, Y) — Q(X, ¥') — QX’, Y) + OXY) = IY cyn — nj - 05) 

icA jeB 


1/2 
= Daal” 


icA 


306 7 The Littlewood—Offord problem 


where v; := Dies 4cij TA and 7 fe = (ni — n;)/2. Observe that the nfl? are 


all independent and have the diswibutiia of n"/*) (i.e. they equal 0 with probability 
1/2, and +1 with pi ai 1/4 each). Also we make the crucial observation that 
the (vi)ica and (ní 1/2 aan are independent. 

It now suffices to show that 


1 
(1/2) _ 
(So o) = o(a) 
icA 


For eachi € A, we have the easy bound 





EIv; = 0)) = P(v; = 0) < ; 


as can be seen by conditioning all the 7; except for a single j for which c;,; Æ 0. 
From Corollary 7.13 we also have 


E(I(v; = 0)) = P(v; = 0) < O (=) ‘ 


By linearity of expectation we thus have 


E (x Iv; = o) < |A| min E O (=)) 


In particular by Markov’s inequality we have 


P (Soa = =< zial) sma (£o (2): 


since |A| = Q(k), we conclude 
P({l <i < n/2: vi # 0} = QK)}) > max (>. 1—0 (5) f 


Now if we condition on the above event (call it E), then the distribution and 
independence of the a ) remain unaffected. Thus we may apply Corollary 7.13 
again to obtain 


P(e vint? = a) =O (=) ; 


we also have the crude upper bound of 3 as before. Thus 


P (x vin x oz) = max E 1-0 (=)) : 


7.6 The quadratic Littlewood—Offord problem 307 


Combining this with the estimate on P(E) and Bayes’ formula, we obtain the 
claim. 














In [64] this estimate was used, together with some techniques from the preceding 
section, to obtain 


Theorem 7.34 [64] Let M, be a random symmetric n x n matrix whose entries 
are random uniformly distributed signs +1, and with the entries in the upper 





triangular half being independent. (The entries in the strictly lower triangular 
half are of course determined from the upper half by symmetry.) Then P(det(M,,) = 
0) = O,(n—"/8+*) for any e > 0. 


Exercises 


7.6.1 Give examples that show that for arbitrary k, / > 1, there exists Q obeying 
the hypothesis in Proposition 7.33 with P(Q = 0) = Q(min(k, D13). 
Thus, except for the exponent 1/8 and for absolute constants, the conclu- 
sion in Proposition 7.33 is best possible. 

7.6.2 Obtain a generalization of Proposition 7.33 to polynomials of degree d 
in N1, ..-, Nn, With 1/8 replaced by an exponent depending on d. 

7.6.3 Improve the constant 1/8 in Proposition 7.33 to 1/4. 

7.6.4  (Meshulam, private communication) Find a quadratic form Q = 
ees cij&i&j, where cj; Æ 0 for alli, j and &; are i.i.d Bernoulli ran- 
dom variables, such that 


PO =0)>Q2- on eed 


Compare this to the linear case (Corollary 7.4). 


8 





Incidence geometry 


Incidence geometry deals with the incidences among basic geometrical objects 
such as points, lines and spheres. One can obtain useful and non-trivial information 
on these incidences by the classical combinatorial technique of double-counting 
the number of a certain type of configuration of incidences in two different ways. 

In many situations, tools from from incidence geometry, combined with a clever 
double counting argument provide a simple, yet powerful, approach to hard prob- 
lems. The goal of this chapter is to demonstrate several such applications, including 
several in additive combinatorics. 

The material is organized as follows. We start with a result on the crossing 
number of graphs, which has a topological flavor. Next, we use this result to 
give simple proof of the famous Szemerédi-Trotter theorem concerning point-line 
incidences. In the next two sections, we use this theorem to prove several bounds on 
the Erdés—Szemerédi sum-product problem and reprove Andrew’s theorem on the 
number of lattice points in a convex polygon. Next, we introduce the method of cell 
decomposition and use it to treat Erdős distinct distances problem in R¢. Finally, we 
discuss a variant of Erd6s—Szemerédi sum-product problem for complex numbers. 


8.1 The crossing number of a graph 


In this chapter, a point refers to a point in the plane R?, and a line refers to a line in 
R’, unless otherwise specified. By a curve, we refer to the image of a continuous 
injective embedding! of a compact interval [0, 1] into R?. 


1 Tn applications one deals with very explicit curves such as circular arcs or straight lines, and so we 
could restrict the class of curves to these sorts of objects if desired. In this way one does not need to 
invoke any difficult results from topology such as the Jordan curve theorem (which is implicit in 
our application of Euler’s formula). 


308 


8.1 The crossing number of a graph 309 


Consider a graph G = G(V, E); recall we assume our graphs G to be undirected 
and have no loops or repeated edges. A drawing of G is any representation of G 
in the plane R? by identifying each vertex in V with a distinct point, and each 
edge (u, v) in E with a curve in R? connecting u and v. The crossing number 
of such a drawing is the number of pairs of edges with no common endpoints, 
where the corresponding curves intersect each other. The crossing number of G 
is the minimum number of crossings in a drawing. Here and later, we denote this 
parameter by cross(G). 

It is expected that if G has many edges, then its crossing number is large. The 
following theorem, which confirmed this intuition, was proved by Ajtai, Chvatal, 
Newborn and Szemeredi [1], and, independently, by Leighton [224]. 


Theorem 8.1 Let G = G(V, E) be a graph with |E| > 4|V|. Then cross(G) > 
IEP 

64V 2" 

Proof A planar graph is a graph whose crossing number is zero. It is well known 
(and can be easily proved using Euler’s formula) that a planar graph G = G(V, E) 
has at most 3|V| edges (in fact it has at most 3|V| — 6 if |V| > 3). Now observe 
that any graph G can be made planar by removing at most cross(G) edges (one for 
each crossing that occurs in an optimal drawing of G). Combining these two facts 
we obtain the preliminary inequality 


cross(G) > |E| — 3|V| (8.1) 


for an arbitrary graph G(V, E). 

This bound is, of course, much weaker than what we want to prove. However it 
is possible to amplify (8.1) substantially via the first moment method as follows. 

Fix G = G(V, E) with |E| > 4|V|, and let 0 < p < 1 be a parameter to be 
chosen later. Let V’ be a random subset of V, chosen so that the events v € V’ are 
independent with probability p. Let G’ = G’(V’, E’) be the induced subgraph of 
G spanned by V’. Applying (8.1) to G’ and then taking expectations, we see from 
linearity of expectation that 


E(cross(G’)) > E(|E"|) — 3E(|V’)). 
Further application of linearity of expectation shows that 
E(\V'|) = pIVi;  E(IE')) = p’lEl, 


since each vertex has probability p of being included in V’, and each edge has 
probability p? of being included in E’. Now consider a drawing of G with exactly 
cross(G) crossings. Each crossing involves four vertices of V and thus has a 
probability p* of surviving when we pass to G’. Using linearity of expectation one 


310 


& Incidence geometry 


last time we conclude 


E(cross(G’)) < p*cross(G); 


we have inequality rather than equality since the drawing of G’ constructed here 
may not have the minimal number of crossings. Putting all this together we have 


The claim then follows by setting p := 4|V|/|E|. 


cross(G) > p “|E| — 3p |VI. 














Remark 8.2 One can improve the bound on cross(G) slightly by optimizing p. 
To obtain a more significant improvement, one needs additional arguments. The 
current best bound is due to Pach and Toth [271]. 


Exercises 


8.1.1 


8.1.4 


Let G(V, E) be a planar graph with no loops or multiple edges. Using 
Euler’s formula V — E + F = 2, show that |E| < max(3|V | — 6, 1). 
Show that this bound max(3|V| — 6, 1) is best possible. 

Show that a planar graph has a vertex of degree at most 5. Use this 
fact and induction to prove that a planar graph is vertex-colorable by 
6 colors, where of course we require adjacent vertices to have distinct 
colors. Without using the four-color theorem, refine the argument to show 
that in fact every planar graph is vertex-colorable by 5 colors. (Hint: given 
any two colors, say red and green, one can swap all the red and green 
colors in a single red-green connected component without difficulty. Now 
given four colors red, blue, green, white adjacent in that order around a 
single uncolored vertex v, it cannot simultaneously be true that the red and 
green vertices lie in the same red-green connected component, and the 
blue and white vertices lie in the same red-white connected component.) 
Show that for any n,e > 1 with e > 4n that there exists a graph G = 
G(V, E) with n vertices and e edges such that cross(G) = Ole? /n?), 
and so the crossing number inequality cannot be improved except for 
constants. (Hint: There are many ways to generate an example. One is to 
connect adjacent and nearly-adjacent points on the unit circle. Another is 
to use Exercise 8.2.2.) 

Show that for any graph G = G(V, E), we have |E| = O(|V|+ 
|V |? cross(G)!/3). 

[342] Let m > 1 be an integer, and let G = G(V, E) be a multi- 
graph with maximum edge multiplicity m, thus each pair of vertices 
are allowed to be connected by up to m edges. Define the cross- 
ing number of a multigraph in the obvious manner. Show that if 


8.2 The Szemerédi-Trotter theorem 311 





|E| > 5m|V|, then cross(G) = Qe). In particular we have |E| = 


O(m|V| + m!3|V 2 cross(G)'/3). 


8.2 The Szemerédi—Trotter theorem 


Given a finite collection of points P and lines L, a basic question is to bound the 
number 


I(P,L):=|{((p,DePxL: pel 


of incidences between P and L. Clearly we can make /(P, L) as small as zero 
without any difficulty, so the interesting question is to maximize /(P, L) for fixed 
cardinalities |P| and |Z]. One of course has the trivial bound /(P, L) < |P||L, 
and one can improve this further without difficulty to 


1(P, L) < min(|P|!/?|L| + |P], (L/P | + ILD; (8.2) 


see exercises. In [348], Szemerédi and Trotter proved the following stronger esti- 
mate, which is sharp up to constants. 


Theorem 8.3 (Szemerédi-Trotter theorem) Let P be a finite set of points and 
let L be a finite set of lines. Then we have 


I(P, L) < 4|PPP|LP? + 4|P| + ILI. 


Proof We may remove those lines / € L which do not contain any points in P, as 
they contribute nothing to the left-hand side. Thus we may assume that every line 
in L contains at least one point in P. Now let G = G(P, E) be the graph whose 
vertices are the points in P, and two points a and b are connected if and only if 
the open line segment from a to b lies in a line in L and contains no points in P. 

We now apply the double counting method to | E |, the number of edges. Observe 
that if a line / in L contains k > 1 points in P, then / contributes k — 1 edges to 
E. Summing over / € L, we conclude 


|E] = (P, L) — |L]. 


On the other hand, observe that G has a tautological drawing, with the vertices in 
P mapping to themselves, and the edge [a, b] mapping to the line segment from a 
to b. Since any two lines in L can intersect in at most one point, we conclude that 
cross(G) < |L|?. Applying the crossing number inequality, we conclude that either 
|E| < 4|P| or cross(G) > |E|?/64|P|?. Thus |E| < max(4| P|, 4|P|?/7|L|?°), 
and the claim follows. 














312 & Incidence geometry 


Remark 8.4 The above proof is due to Székely [342]; the original proof of 
Szemerédi and Trotter is quite different (see Exercise 8.4.7 for a proof closer 
in spirit to that). The symmetry between P and L can be explained by projective 
duality; if we embed the plane R? into the projective space of R’, then points 
become associated to subspaces of R? of dimension 1, while lines are associated 
to subspaces of codimension 1. 


Let us now derive a few corollaries from the theorem. An immediate conse- 
quence, which we leave as a exercise, allows us to bound the number of lines which 
are “rich” in the sense that they contain many elements of a given set P of points. 


Corollary 8.5 (Rich lines) Zf P is any finite set of points and k > 2, then 


P? |P 
aemetirie meo na En 
Bok 


Dually, for any finite set L of lines, we have 


Ke RIU € L: p e) = kil = O (max ( HE a) 
i = Bok I} 


1/2 so the term 


Remark 8.6 In typical applications, such as those below, k < | P| 
2 
LE is dominating. The case k > |P|!/ can be treated by the cruder estimate (8.2). 
Similarly for the second half of the corollary. 
Next, we bound the number of pairs of points which are connected by a rich 
line. 


Corollary 8.7 (Rich pairs) Jf P is any finite set of points and k > 1, then 


P 2 
Mpg) E Px Ps p# ak < lipa OPI = 2411 = O (max (FE, ipp) ) 
where l, q is the unique line connecting p and q. In particular, if 1 < k < |P|!/2, 
then 


Pi? 
[{(p,ge P xP: p#qk <[lpgO Pl < |PIP} = o (£L). 


Proof For the first bound, we observe that each line / with k < |L A P| < 2k 
contributes at most O (k?) pairs to the left-hand side, so the claim follows from 
Corollary 8.5. The second bound follows from the first by a standard dyadic decom- 
position argument. 














An easy modification of this argument, which we leave as an exercise, allows 
us to also control collinear triples that are not on too rich of a line: 


8.2 The Szemerédi-Trotter theorem 313 


Corollary 8.8 (Collinear triples) Let P be a finite set of points. Then the number 
of triples (u, v, w) where u, v, w are three collinear distinct points in P, whose 


line contains at most | P|! points in P, is at most O(|P |? log | P|). 


Applying this in particular to Cartesian products P = A x B, where A, B are 
sets of real numbers with |A| = |B| = m, we observe that | P| = m? and no line 
intersects P in more than | P|!/* = m points. We conclude 


Corollary 8.9 Let A and B be sets of real numbers of cardinality m. Then A x B 
contains at most O(m* log m) collinear triples. 


It is an easy matter to extend the Szemerédi—Trotter theorem to more general 
curves than lines. 


Theorem 8.10 (Generalized Szemerédi-Trotter theorem) [342] Let P be a 
finite collection of points in R?, and let L be a finite collection of curves in R?. 
Suppose that any two curves in L intersect in at most a points, and any two points 
in P are simultaneously incident to at most p lines; then 


HP, D E€ P x L: pel = O(a'?p'?|PPPL|? + ILI + BIPI). 


As an application of this theorem we prove the following remarkable result of 
Andrews [13]. 


Theorem 8.11 Let Tl C R? be a lattice (e.g. T = ZP). If C is a convex n-gon with 
vertices in T, then the interior of C contains Q(n*) lattice points. 


Proof LetC be the boundary of C and F be collection of (piecewise linear) curves 
obtained by translating C by the lattice points inside C. Let P be the set of lattice 
points covered by the union of the curves in F and m be the number of lattice 
points inside C. We have |F| = m and | P| = ©(m) (cf. (3.10)). 

We apply the double counting method to the number of incidences between 
P and F. On the one hand, the generalized Szemerédi—Trotter theorem gives an 
upper bound of O(m*/?) for these incidences. On the other hand, each translate of 
C contains exactly n points, so the number of incidences is at least nm. Comparing 
these bounds we obtain m = Q(n?) as desired. 














Remark 8.12 The above theorem generalizes for R“. For any fixed d, Andrews 
proved that a convex polytope in R? with n non-coplanar integral points on its 
boundary has volume Q(n\¢+)/4@-)), The above proof, however, does not gener- 
alize for higher dimensions. 


An important open problem is to extend the Szemerédi—Trotter theorem to 
planes over other fields, for instance the complex plane C? or the finite field 
planes F A The crude estimate (8.2) applies in all of these situations, but one 


314 


& Incidence geometry 


would like to improve this bound. In the case of F 2 it was shown that 1(P, L) = 
Os(max(| P|, |L|)?/2-*) whenever | P|, |L| < p?~° forall 5 > Oand some e(5) > 
O depending on ô; see [43], [44]. The main ingredients in this argument was 
the sum-product estimate in Corollary 2.58 and the Balog—Szemerédi—Gowers 
theorem (Theorem 2.29). 


Exercises 


8.2.1 


8.2.2 


8.2.3 
8.2.4 
8.2.5 


8.2.6 


8.2.7 


8.2.8 
8.2.9 


8.2.10 


Using only the basic facts that two distinct points determine at most one 
line, and two distinct lines intersect in at most one point, together with 
the Cauchy—Schwarz inequality, prove (8.2). Observe that this argument 
works over any field, not just R. In the case where the field is F „2, show that 
the bound can be sharp when | P| = |L] = p*, or when | P| = |L] = p*. 
Let n,m > 1 be given. Find an example of a set of points P and a 
set of lines L such that |P| =n, |L| =m, and the number of inci- 
dences between P and L is Q(n7/3m?/3 + n+ m), thus demonstrating 
that the Szemerédi—Trotter theorem is sharp up to constants. (Hint: con- 
sider sets P of the form P = [1,a] x [1, ab] for various parameters 
a, b.) 
Prove Corollary 8.5. 
Prove Corollary 8.8. 
Let P be a finite set of points, and let k > 2. Show that 

l IP? 
{(p,l): p € P;l aline; p € L |L A P| > k} = o( EL +IP] log1P). 
(Beck’s theorem) [19] Let P be a finite set of points. Show that either 
there exists a line that is incident to ©(| P|) points in P, or there exist 
©(| PIĈ) lines that are each incident to exactly two points in P. 
(Sylvester—Gallai theorem) Let P be a finite set of points, not all of which 
are collinear. Show that there exists a line that contains exactly two points 
in P. (Hint: minimize the quantity dist(p, /), where / is a line containing 
two or more points in P and p € P\L. Using elementary geometry, show 
that this quantity is minimized only when / contains exactly two points 
from P.) 
Prove Theorem 8.10. (Hint: use Exercise 8.1.5.) 
Let y be a strictly convex curve in R*. Show that |(R-y)QT|= 
Oy (R7/?) for all R > 1 and all lattices T. 
Let y be a strictly convex curve in R?, and let A be a finite set in R?. Show 
that |{(a, a’) € Ax A :a —a € y}| = O(\A|*?). Deduce from this that 
I(x — yl : x, y € A} = QUA). 


8.3 The sum-product problem in R 315 


8.3 The sum-product problem in R 


In Section 2.8 we considered the sum-product problem, where one wished to 
establish lower bounds on either the sum set A + A or the product set A - A when 
A was an arbitrary non-empty finite subset of a field or ring. For instance, it was 
shown there that if the ambient field contained no proper subfields, then one had 
JA + A| + |A - A| = Q(|A|!**) for some explicite > 0. In the case when A is a set 
of integers (or more generally of real numbers), Erdős and Szemerédi conjectured 
the following stronger result: 


Conjecture 8.13 (Erdés—Szemerédi conjecture) [9/] Let A be a finite non- 
empty set of integers or reals. Then for any £ > 0 we have 


|A+ A|+|A- A] > Q,(JA?*). 


The condition £ > 0 is sharp; see Exercise 8.3.6. 

In support of this conjecture, Erdős and Szemerédi [91] proved the bound 
|A+ A| + |A- A] > Q(\A|'**) for some absolute constant 6 > 0, when A is a 
set of integers. Nathanson [258] showed that one can set 6 = 1/31. Ford [105] 
improved 6 to 1/15. These proofs relied on properties of factorizations. 

In 1997, Elekes [76] improved ô to 1/4 and extended to the case of real numbers, 
using the Szemerédi—Trotter theorem in an ingenious way. 


Theorem 8.14 Let A be a finite non-empty set of reals. Then 
|A+ Al x |A- A] = Q(]A °). 

In particular 
|A+ A] + |A- A] = Q(|A/>”). 


Proof Let P = {(a,b)|ae A + A,b € A - A}; P is a subset of the plane and has 
cardinality |A + A||A- Al. 

Consider the set L of lines of the form {(x, y) : y = a(x — b)} where a, b are 
elements of A. Clearly, L has |A|? elements. Moreover, each such line contains at 
least |A| points in P, namely the points (b + c, ac) with c € P. Thus I(P, L) > 
|A|. Applying the Szemerédi-Trotter theorem we conclude 


IAP? < O((IA + AIJA - AŻ QAP? + [A + AIA - Al + IAP), 











and the claim follows by elementary algebra. 





Very recently, Solymosi [324] added a new twist to Elekes’ argument, essentially 
improving € to 3/11. 


316 & Incidence geometry 


Theorem 8.15 [324] Let A be a finite set of real numbers with |A| > 2. 

14 
Then we have |A + A|®|A/A[> = @(|A|'4), and |A + A[§|A- AP = (esa) 
Consequently 


|A+ A] + |4- A] = Q(|A]!*/"'/log*/"' | Al). (8.3) 


Proof We may remove zero if necessary and assume that all elements of A are 
non-zero. We shall need a dyadic decomposition of A/A, in order to control the 
multiplicity of quotients in A/A from both above and below. Let 2 < d < |A| be 
a power of two to be chosen later, and let Dy C A/A be the set 


Dg :={m € A/A :m=aj,/d2 for between d and 2d values of (a1, a2) E€ A x A}. 


Let P := A x A, and let L denote all the lines {(x, y) : y = mx + b} with slope 
m in Dg, and which contain at least one point in P. Observe that L is finite, and 
that each point p € P is incident to | Dg| lines in L. Thus by Corollary 8.5 we have 


ap = Pi = 0 (i+ er) 
[Dal |Dal>]’ 





since |Dg| < |A/A| < |A[’, this implies a lower bound on |L]: 
IL] = Q(/A||Dal*”). (8.4) 


Now let P’ := (A + A) x (A+ A). Observe that if J € L, then / has some 
slope m € Dg and contains a point (a), a2) in P. In particular, l Ñ P’ contains 
the set {(a; + a3, a2 + a4) : a3, a4 E A3a3/a4 = m}, which has cardinality at least 
d by definition of Dz. Thus each line in L contains at least d points in P’; by 
Corollary 8.5 again, we conclude that 


Ermey alee ale warey ai 
salem Cer makin om) moa a 


where the latter bound follows since d < |A| and | P’| > |A|’. Inserting (8.4) and 
|P’| = |A + A|? we obtain after some algebra 


JA + Al’? 
|A|2/3q2 





|Da| = O ( (8.5) 


In particular, by definition of Dz, 


A Al83 
Karan) € A x A a/a € Dall = 0 (! TAI ) 


|A|2/3d 


Summing this over d equal to all powers of two greater than C|A + A[8/3/|A]!8/3 
for some large absolute constant C, we obtain 


1 
[{(a1, a2) € A x A: a/a € Dg for some d > CJA + AlS?/A87}| < JA? 


8.3 The sum-product problem in R 317 


and hence 
1 
[{(a1, a2) € A x A: a/a € Dg for some d < CIA + A[S?/|A87}] > 5lAl. 


But for d as above, each m € Dy has at most O(d) = O(|A + A|®/3/|A|!4/?) rep- 
resentations of the form a;/az, and so we can conclude that |A/A| = Q(|A|?/ 
(JA + A|®/3/|A|®/?)) which gives the first inequality. 

To prove the second inequality, we observe from (8.5) that 


$ 


A+ Ap? 
an do, a3, 44) € A x A: ay /dy = 3/4 € Dg}| = O (4) 


|A|23 
note that while the above argument was only ford > 2, the estimate here also holds 
for d = 1 by crudely bounding the left-hand side by |A|? and bounding |A + A| 
from below by |A|. Summing this over d equal to all powers of 2 between 1 and 
|A|, we obtain 


A Als 
|{(a1, a2, a3, a4) € A x A : aı/a = 43/a4}| = O (Ata be jal) : 


On the other hand, by a simple double counting argument (cf. (2.8)) we have 


I{(a1, a2, a3, a4) € A x A : a) /a = a /a4}| > |Al*/|A- Al, 








and the claim follows. 








A special case which draws lots of attention is when either |A + A| or |A- A 
is small. Elekes and Ruzsa [80] proved the following theorem. 


Theorem 8.16 Let A be a finite set of real numbers with |A| > 2. Then 


i _ of lale 
|JA+ Alt/A- Al = 2 l 
log |A] 


In particular, if |A + A| = O(\Al), then |A - A| = Q(A[2/log|A)). 





The logarithmic factor is necessary; if one has A := [1, n] then it is known that 


|A- A| = O(a) for some positive constant c. (See also Exercise 8.3.6.) 





Proof Itis easy to reduce to the case when the elements of A are positive. Let P := 
((A + A)U A) x (A+ A)U A); thus P is a collection of points of cardinality 
O(|A + A|*). We shall apply the double counting method to the number of collinear 
triples in P. On the one hand, Corollary 8.9 shows that the number of such triples 
is O(|A + A|? log |A|). On the other hand, a standard Cauchy-Schwarz argument 
(cf. (2.8)) shows that 


|Al* 
|A- Al’ 





I{(a,b,c,d)e AX AX Ax A:ab=cd}\> 


318 & Incidence geometry 


We may assume |A - A| < 5|A/? since the claim is trivial otherwise. We can then 
remove the a = d contribution from the right-hand side and conclude 





Alt 
Ma,Qed) CAKAKAXA:ab=cdha #d)=9( 4 a 


For any (a, b, c, d) in the above set and e, f € A, observe that the three points 
(e, f), (e+a, f +c), (e+b, f+ A an a collinear triple in P. The number of 
triples obtained in this manner is Q(/4— JA. a: Combining this with the upper bound, 











the claim follows. 





The above results show that if |A + A| is close to |A|, then |A - A| is close to 
|A|?. In the other direction, the best known results are due to Chang [49], who has 
established that if |A - A| < K|A| then |A + A| > 367% |A|?, and more generally 
|hA| > (2h? — h)~"¥|A|" for all h > 2. Those arguments are not as elementary 
as those presented here, relying instead on a result of Freiman (Theorem 5.13) 
and the machinery of A(p) constants from Section 4.5 in order to get good lower 
bounds on | A]. See [49] for further details and some history of the problem. 


Exercises 


8.3.1 | Show that the Erdés—Szemerédi conjecture for sets of integers is equiv- 
alent to the corresponding conjecture for sets of rationals. Show that the 
conjecture for sets of reals is equivalent to the conjecture for sets of 
algebraic integers. It is not known whether the conjecture for reals is 
equivalent to the conjecture for (rational) integers. 

8.3.2 Let A, B be additive sets of real numbers with |A], |B| > 2. Show that 
lao Sol = = Q(|A||B|). (Hint: apply Beck’s theorem to P = A x B.) In 
particular, in the notation of Section 2.8 we have |Q[A]| = Q(|A|?); 
compare this with Corollary 2.51 and Corollary 2.52. 

8.3.3 Let A, B, C be additive sets of real numbers. Show that |A + B-C| = 
Q(|A|/?|B|!/?|C|!/7). (Hint: if |B| < |C], apply the Szemerédi-Trotter 
theorem with P := B x (A+ B- C)and L equal to those lines with slope 
in C and y-intercept in A.) Conclude that |A(B - C)| = Q((|B||C |)! 1/2") 
for all h > 1. 

8.3.4 | Generalize Theorem 8.14 by demonstrating the inequality |A + B||B - 
C| = Qamin(/A]| BIC], [AL 1B]? |C 7). 

8.3.5 [79] Let f :R — R be any strictly convex function, and let A be 
an additive set of reals. Show that |A + A|| f(A) + f(A)| = €A"). 
(Hint: note that Theorem 8.14 addresses the case when f(x) = log x; this 
should suggest a proof for the general case.) 


8.4 Cell decompositions and the distinct distances problem 319 


8.3.6 Letn be a large integer. Using Theorem 1.6, show that all but at most 
o(n”) elements of [1, n] - [1, n] have (2 + 0(1)) log logn prime divisors. 
(Note that the convergence of the sum $% + shows that one can 
neglect those elements which have a large square factor.) Conclude that 


IEL, n] - [1, n]| = o(n?). For much more precise estimates, see [106]. 


8.4 Cell decompositions and the distinct distances problem 


Given a finite point set P C Rf, let g(P) := {|x — y| : x, y € P} denote the 
number of distinct distances between the elements of P. Define gg(n) = 
MIN pcRA| Pj=n &(P). The well-known distinct distances problem of Erdős, posed in 
1946 [83], asks to determine the correct rate of growth of gq(n) in n for each fixed 
d; this question remains open even when d = 2. (Clearly we have gı (n) =n — 1.) 

By considering the progression P = [1, n!/4]4 it is easy to see that gy(n) = 
O(dn*/¢), Erdős and many other researchers conjecture that gq(7) is close to this 
upper bound; in particular it is conjectured that gy(n) = Q.,q(n/4~*) for any € > 0. 

It is quite easy to establish the lower bound gg(n) = Qa(n'/4): see Exer- 
cise 8.4.2. There is a series of improvements for the case d = 2, due to Moser [252], 
Chung [57], Chung—Szemerédi—Trotter [60], Székely [342], Solymosi—T6th [328], 
Tardos [353], Katz and Tardos [197]. The most current bound is g2(n) = Q(n°8635) 
[197], using the approach in [328] combined with clever entropy arguments. Here 
we will present a slightly weaker bound due Székely; this argument forms the base 
for all the subsequent bounds mentioned above. 


Theorem 8.17 [342] We have go(n) = Q(n*””). 


Proof Let P be a set of n points in R?. Define an isosceles triangle to be a 
triple (p, q, q’) of distinct points in P such that |p — q| = |p — q’|. We say that 
the isosceles triangle is narrow if the circular arc from q to q’ with center at p 
contains no other points in P. We refer to the pair (q, q’) as the base of the isosceles 
triangle, and p as the apex. For any k > 1, we say that a pair (q, q’) is k-rich if it 
is the base of at least k narrow isosceles triangles, and k-poor otherwise. 

Let N be the number of narrow isosceles triangles (p, q, q’). We shall apply a 
double counting argument to N. We begin with the lower bound. There are | P| 
choices for p. Given p, the remaining | P| — 1 points in p are contained in at most 
8o(|P|) circles centered at p. Let C be the collection of such circles, then we easily 
verify that the number of isosceles triangles with apex p is 


Yi ACAP]. 


CeC:|CNP|=2 


320 & Incidence geometry 


Since X cec |C N P| =|P|— 1, we can write the above quantity as 2|P| — 
O(g2(P)). Summing over all p we conclude that 


N > 2|P|? — OPP). 


Now we obtain the upper bound. We let k > 1 be a parameter to be chosen later, 
and split N = Nyich + Npoor, Where Nyich (resp. Npoor) is the number of narrow 
isosceles triangles with a k-rich (resp. k-poor) base. Observe that if (p, q, q’) is 
an isosceles triangle with a k-rich base, then the perpendicular bisector / of q, q’ 
contains p and also contains at least k points from P. Conversely, for fixed / and 
p there are at most 4g>(| P|) pairs (q, q’) with perpendicular bisector / for which 
(p,q, 7’) is a narrow isosceles triangle; this can be seen by covering the points in 
P\{p} into at most g2(|P|) circles and observing that each circle contributes at 
most four such triangles. Applying Exercise 8.2.5 we conclude 
IP]? 
Nrich = O(82(| P|) (aL + |P] log PD) . 

As for the poor triangles, consider the multi-graph drawing G whose vertices are P 
and whose edges are the circular arcs corresponding to narrow isosceles triangles 
(p, q, q’) with a k-poor base. This graph has | P| vertices and Npoor edges, and has 
edge multiplicity at most k. Thus by Exercise 8.1.5, we have 


Nopoor = O (k| P| + k'?| P|??cross(G)'/?). 


On the other hand, since the drawing of G is contained in at most | P| g2(| P|) circles 
(each center p € P contributing at most g2(|P |) circles), and any two circles cross 
in at most two points, we see that cross(G) < 2(|P|g2(| P|))?; thus 


Npoor = O(K|P| + KPI PP gP VP). 


Combining our upper bounds for Npoor and Nrich with the lower bound for N, we 
obtain 

2 |P]? dia 2 
[PI < O(|Plgo(|P|))+ Of g2(|P|) Ga tIPiilos |PI + O(k|P|+k3|P|3 go(|PI)3). 


We optimize this by setting k := c| P|?/> for some small constant c > 0, and some 
elementary algebra then gives g2(| P|) = Q(|P|*/>) as desired. 














The above argument generalizes to many other metrics than the Euclidean 
metric; see [130]. To go beyond n*/>, however, it seems that one needs to use 
the finer arithmetic structure of Euclidean geometry. Very roughly, the results of 
[328], [353], [197] proceed by analyzing the perpendicular bisectors of all of the 
narrow isosceles triangles (p, q, q’) with a given apex p and a k-rich base; note 
these bisectors are k-rich in the sense that they contain at least k points in P. Using 


8.4 Cell decompositions and the distinct distances problem 321 


polar coordinates around p, one can parameterize these bisectors using the sum 
of the angles of q and q’. One can then use some bounds on partial sum sets to 
obtain non-trivial lower bounds on the number of k-rich lines through p, which 
can then be combined with Exercise 8.2.5 to obtain an improvement to Theorem 
8.17; see [328]. The further refinements in [353], [197] proceed similarly, but with 
a slightly weaker notion of narrow isosceles triangle, allowing the circular arc 
connecting q with q’ to contain O(1) other points from P. This provides several 
further partial sum sets to yield slightly better lower bounds on the number of 
k-rich lines through p. 

In the higher-dimensional case d > 2 much less is known. However there are 
some reasonable results if one imposes some uniform distribution on the points. 
Let Qf be the standard unit cube in Rf, centered at the origin. Let us call a finite 
set P C R? homogeneous if P C |P|'/4- Q, and |P N(x + Q)| = Og(1) for all 
x € R’. A good example of a homogeneous set can be obtained by starting with 
the progression [1, | P|!/“]¢ and perturbing each element of this progression by an 
arbitrary bounded displacement. 

A weakened version of Erdős’ original problem asks for the number of distinct 
distances in a homogeneous set. Homogeneous sets are interesting for at least 
two reasons. First, the best known upper bounds for the distance problem are 
homogeneous. Second, homogeneous sets play an important role in analysis (see 
e.g. [189]). In this section we prove 


Theorem 8.18 [326] Let P CR? be a homogeneous set. Then gq(P)= 
2 t 
QaP|4 E). 


This should be compared with the (homogeneous) lattice example P = 
[1, |P|!/2]¢, which gives g4(P) = Oa( P |3). As in Theorem 8.17, the proof starts 
by a double counting argument applied to narrow isosceles triangles. However, 
crossing number and Szemerédi—Trotter type results are not available in higher 
dimensions, and one instead uses the more flexible technique of cell decompo- 
sition. Given a large, complex incidence system S, we try to break it into many 
pieces, each of which has only a small number of incidences. After the decom- 
position is achieved, an (often tricky) double counting argument concerning the 
number of a properly defined object yields fairly efficient bounds. 


Proof We may of course assume | P| > 2. By hypothesis, P is contained in the 
cube |P|!/4. Q. Let 1 <r <|P|'/“ be an integer to be chosen later. By using 
hyperplanes parallel to the coordinate axes, we can partition |P|'/4- Q = C, U 
<- - U C,a, where each C; is a cube of side-length | P|!/“/r; we assign the boundary 
points of these cubes arbitrarily to one of the cubes of the partition. We refer to 
the cubes C; as cells. 


322 & Incidence geometry 


For each p € P, the set P\{p} is contained in the union of at most gy(P) 
spheres centered at p. We denote by S, the set of these spheres. For each sphere 
S € Sp, let Cs denote all the cells C; which intersect S and which contain at least 
one point of P; from elementary geometry we see that |Cs| = Og(r¢~!). 

We now apply a double counting argument to the quantity 


N := |{(p, S,C,q,q'): p € P3S € Sp;C € Cs;q,q' € PA SN C;q £q’); 


informally, N counts the number of isosceles triangles in P where the base points 
lie in the same cell (cf. the proof of Theorem 8.17). We begin with an upper bound. 
Observe that there are r possible cells C. A cell has side-length | P|!/4/r > 1, so 
by homogeneity it contains O4(|P|/r“) points in P. Thus there are O4(| P|/ r9}? 
possible pairs q, q’ that can be associated to C. For each such pair, observed that p 
must lie on the hyperplane bisecting q and q’ (since q, q’ lie on a sphere centered at 
p). By homogeneity again, this hyperplane contains at most O (| P|¢—)/“) elements 
of P. Finally, once p,q, q’ are fixed, S is completely determined. Putting this all 
together we obtain the upper bound 


N <r4Oq(\P\/r4 OARE Y = [PP ar, (8.6) 


Now we obtain a lower bound. Observe the explicit formula 


N=% 9 Do lPnsncpP-|Pnsnc. 


PEP SES, CECs 


From Cauchy—Schwarz we have 


ys IPASACP > [ONS 
eae ICs| 
and 
Snp FHV 
SES, ga(P) 
and hence 
neS C eip 





L ga(P)ICs| 


Since |Cs| = Og(r¢—!), we conclude 


IPI? 2 
N > Qa | — z | IPK. 
ga(P rt} 


Combining this with (8.6) and rearranging, we conclude 


(P) = Q al 
8d = d ğ 
ri- (|P -ar-a J> IPI?) 





8.4 Cell decompositions and the distinct distances problem 323 


1 


1 
We optimize this by selecting r to be the nearest integer to |P|¢ £ , and the claim 
follows. 














For d > 3 and general (inhomogeneous) sets, little has been known for a long 
time, as the method based on the Szeméredi—Trotter theorem cannot be generalized 
to dimension larger than 2. Clarkson, Edelsbrunner, Gubias, Sharir and Welzl 
[63] proved that g3(n) = Q(n"/?), In 2002, Aronov, Pach, Sharir and Tardos [14] 
proved that g3(n) = Q.(n’7/"4!~*) for any € > 0. More generally, they proved 
that gq(n) = Qa,<(n'/4-°/)-£) for any d > 3. This result gives a non-trivial 
improvement for small d, compared to the previous bound n'/¢, On the other 
hand, as d — oo, the exponent | /(d — 90/77) — € converges to 1/d, rather than 
to the conjectured bound 2/d. 

Very recently, Solymosi and Vu [327] managed to show that the exponent 2/d 
is best possible to top order, in the sense that it cannot be replaced by (2 — € + 
Oa-+oo(1))/d for any positive constant € > 0. More precisely, they showed that that 


ga(n) = Qa(n4 aa) 


for all d > 4, and also g3(n) = Q(n 3), 

This result and the previous bound of Aronov et al. were proved using the 
decomposition method combined with other arguments. Unlike the homogeneous 
case, the decomposition used here is more sophisticated and was first developed by 
Chazelle and Friedman [52] (see also [245]), motivated by problems in geometric 
searching in computer science. Let us conclude this section by briefly discussing 
this result. 

One of the main techniques for doing a search is divide-and-conquer. In many 
problems, the situation looks as follows: given a set B of hyperplanes (of co- 
dimension 1) in R, one would like to partition Rf in not too many parts so that 
each part intersects only few hyperplanes. 


Definition 8.19 A hyperplane H strongly intersects a set P if H AN P is not empty 
and P has a point on both side of H. 


Lemma 8.20 Let B be a set of k hyperplanes in R!. For any 1 < r < k, one can 
partition R? intor sets P;,..., P, such that for each 1 <i < r, there are only 
O(k/r'/¢) planes which strongly intersect P;. 


The bound O(k/r!/) is best possible; the hidden constants in O depend on d but 
not on r. One can also guarantee that the sets P; are generalized simplices. Strong 
intersection actually means intersection with the interior (see [245]). Let us now 
consider a little bit more complex situation when beside B we also have a set A of 
n points. We can require, in addition, that each part contains not too many points. 


324 & Incidence geometry 


Lemma 8.21 Let A be a set of n points and B be a set of k hyperplanes in R’. 
For any 1 <r < k, one can partition R? intor sets P;,..., P, such that for each 
1 <i<r,|P,NA| < 2n/r and P; strongly intersects O(k/r'/4) planes. 


Lemma 8.21 is not restricted to hyperplanes. It still holds if we replace a family 
of hyperplanes by a family of surfaces satisfying certain topological conditions. 
In particular, the lemma holds if we replace hyperplanes by (full-dimensional) 
spheres (see Section 6.5 of [245]). As an analog of Lemma 8.21, we obtain the 
following lemma, which was actually used in [327]. 


Definition 8.22 A sphere S strongly intersects a set P if SM P is not empty and 
P has a point on both sides of S. 


Lemma 8.23 Let A be a set of n points and B be a set of k spheres in R“. For any 
1 <r < k, one can partition R? intor sets P,,..., P, such thatforeachl <i <r, 
|P; O A| = O(n/r) and there are only O(k/r'/“) spheres which strongly intersect 
P;. 


It would be very desirable to have a finite field analog of the above lemmas. 
Here is the simplest form of the problem: given a set of lines (or simple curves) 
on a finite plane, we would like to partition the plane into a few parts so that each 
part intersects only a few lines. The main obstacle here is that one needs to find 
a proper replacement for the topological condition of strong intersection. This 
condition was used to rule out extremal cases such as when all the hyperplanes go 
through the same point. 


Exercises 


8.4.1 Let A be a finite non-empty set of reals. Show that |k((A — A)*2)| > 
gx(|A|?), where X42 := {x° : x € X}isthe set of squaresin X,andkX := 
X-+---+X is the k-fold sum set of X. Thus progress on the Erdős 
distance problem is linked to progress on questions of sum-product type; 
see [43] for some further development of this idea. 

8.4.2 [83] Let x1, ..., xq be d points in general position in R“. Show that if 
x € Rf, then the d distances |x — x)|,..., |x — xg| determine x up to 
a multiplicity of O,(1). Use this to show that gg(n) = Oq(n"/4) for all 
n. (Note that the degenerate case in which many points lie in a lower- 
dimensional space can be dealt with by an induction argument.) 

8.4.3 [326] (Rich lines in three dimensions) Let P be a homogeneous set in 
R°. Show that eninPlsk |L A P| = O(|P|?/k?) for all k > 2. 

8.4.4 [326] Let A be a homogeneous set of cardinality n in R? and P be a 
collection of D pairwise non-parallel planes. Then there is a plane P € P 


8.5 The sum-product problem in other fields 325 


such that the orthogonal projection of A on P has min(Q(D!/7n?/3), n/4) 
elements. 

8.4.5 [326] (Beck’s lemma for homogeneous sets in R?) There is a positive 
constant K such that the following holds. Let B be a homogeneous set of 
s points in R? and F be a set of f pairs of points of B. At least f/2 pairs 
of F are on lines incident to at most K F7 points of B. 

8.4.6 Letn be a large number, and let P C R? be the set A := [1, /n] x 
[1, Vn]. Show that | P| = ©(n) and g2(P) = o(n). (Hint: for primes p = 
3 (mod 4), any number divisible by p but not by p? cannot be written 
as the sum of two integer squares. Use this fact for all small p (say 
p < loglogn) and the Chinese remainder theorem to improve upon the 
trivial bound of g2(P) = O(n).) Conclude in particular that g2(n) = o(n). 

8.4.7 The purpose of this exercise is to sketch an alternative proof of the 
Szemerédi—Trotter theorem via cell decomposition. Let P, L be collec- 
tions of points and lines, and let 1 < r < |L|/2. Choose r lines from L 
at random; show that this divides the plane into O(r?) regions (known 
as “cells’’), and that all the other lines in L intersect at most O(r) of 
these cells. Show that there are at most O(r|L|) of incidences (p, /) 
with p lying on the boundary of one or more cells. By applying (8.2) to 
the points and lines incident to the interior of each cell, and then sum- 
ming using the Cauchy—Schwarz inequality, show that there are at most 
O(r|L| + r7!/?|P||L|'/7) incidences (p, 1) with p in the interior of one 
of the cells. Optimize this in r to conclude the Szemerédi—Trotter theorem 
up to an absolute constant. 


8.5 The sum-product problem in other fields 


A natural extension of the sum-product problem is to consider sets from fields 
and rings other than R. One example (when R is replaced by Z, for a prime p) 
was consider in an earlier chapter. In this section, we consider the case when R is 
replaced by the set of complex numbers. 

One way to attack the problem is to prove a complex version of Szemerédi— 
Trotter theorem and then repeat the proofs of Theorems 8.14 and 8.15. While it is 
believed that the statement of Szemerédi—Trotter theorem holds for complex lines 
and points, proving it is not easy as the technique using the crossing number no 
longer applies (see however the recent announcement by Tóth [368]). 

In the following, we show that using a clever double counting argument, one 
can extend Elekes’s result for complex numbers. In fact, the argument, which is 


326 & Incidence geometry 


due to Solymosi [325], is effective for several other number fields as well. (See 
the remark at the end of the proof.) 


Theorem 8.24 [325] For any finite non-empty sets of complex numbers A, B, 
and Q, 


IA + B| - |A - Q| = QAP IB Pl’). 
By setting Q = B = A, it follows immediately that 
|A + Al- |A- Al = Q(141°°) 
and 
|A +A] + |A- A| = Q(|A)*), 
thus this theorem generalizes Theorem 8.14. 


Proof We may assume |A| > 2 and0 4 Q. From elementary algebra we observe 
that the map 


(a,a',b,q) |> (a +b,a' +b, aq, a'q) 


is one-to-one from A x A x B x Q to (A+ B)x(A+B)x(A-Q)x(A- Q) 
provided that we exclude the diagonal a = a’. This observation by itself is only 
enough to obtain the trivial bound |A + B| - |A - Q| = Q(A||B|!/?|Q|!/7). How- 
ever we can do better by exploiting the intuitive observation that if a’ is close to 
a, then a’ + b is close to a + b and aq is close to a’q. 

More precisely, for each a € A, define the nearest neighbor a’ of a to be an 
element of A\a which minimizes the distance |a — a’|. (If there is more than 
one candidate for nearest neighbor, choose arbitrarily.) We refer to (a, a’) as a 
neighboring pair, thus there are |A| neighboring pairs. We caution that if (a, a’) is 
a neighboring pair then (a’, a) is not necessarily a neighboring pair also. 

Call a quadruple (a, a’, b, q) good if (a, a’) is a neighboring pair, b € B and 
q € Q, and one has the closeness properties 





7 28|A + B| 
{ue A+B: la+b—-u| <la-al}| < TA] (8.7) 
and 
28|A - Q| 
{vue A-Q:|aq—v| < |aq—a'g|}| < Tan (8.8) 


Informally, (8.7) and (8.8) assert that a’ + b is a fairly close neighbor of a + b in 
A + B, and similarly a’q is a fairly close neighbor of ag in A - Q. We will apply 
a double counting argument to N, the number of good quadruples. 


8.5 The sum-product problem in other fields 327 


First we establish a lower bound. For eacha € A let Dg := {z € C : |z — a| < 
|a’ — al} be the disk of radius |a’ — a| centered at a. A simple geometric argument 
(which we leave as an exercise) shows that any complex number z can be contained 
in at most seven of these disks. In particular for any b € B we have 


X we At+B:la+b-ul<la—a'i= D> lfaeA: ce Da}| <7|AtBl 


acA z€A+B—b 


and similarly for any g € Q 


Yi ltveA- QO: lag —v| <laq—a'g}l= D> aeA:ce Dy <7IA- Ql. 
acA zEA-Q/q 

If we thus fix b and q and choose a € A uniformly at random, a simple application 
of Markov’s inequality then shows that (a, a’, b, q) will be good with probability 
at least 1/2. This shows that 

IA 
N > |B Ol 

Now we establish an upper bound. Recall that the quadruple (a, a’, b, q) is uniquely 
determined by the quadruple (a + b, a’ + b, aq, a'q). There are |A + B| choices 
fora + b and |A - Q| choices for aq. For fixed a + b, we see from (8.7) that there 
28|A+Bl elements of A + B which are closer to or equally distant from 


[A] 
a + b than a’ + b, and thus there are at most ae 


are values of a’q. This gives the upper bound 


are at most 


values of a’ + b. Similarly 





there are at most 


28|A +B 28|A 
Ns 1A+ BIA. Q| a = 


Combining this with the lower bound, we obtain the claim. 














Remark 8.25 A similar argument works for quaternions and for other hypercom- 
plex numbers. In general, if T and Q are sets of similarity transformations and A is 
a set of points in space such that, from any quadruple (f(p1), t(p2), 4(p1), 4(p2)), 
the elements t € T, q E Q, and pı 4 p2 E€ A are uniquely determined, then 
c| Al>/?|T|!/2|Q|!/2 < |T(A)| - |Q(A)|, where c depends on the dimension of the 
space only. 


To conclude this section, let us describe a recent result of Chang, who investi- 
gates the sum-product problem for matrices [51]. 


Theorem 8.26 There is a function ®(n) tending to infinity with n such that the 
following holds. Let d be a fixed integer and A be a finite set of d x d real matrices 
such that for any two different elements M and M' of A, det(M — M’) # 0. Then 


|A+ A] +|A- A] > ADIA]. 


328 & Incidence geometry 


Theorem 8.27 For every d there is a positive constant € = e(d) such that the 
following holds. Let A be a finite set of d x d real, symmetric, matrices. Then 


[A+ A|+]A-A| > |AI. 


The proofs of these theorems are more complicated than those presented here 
and we refer the readers to [51] for details. 


Exercise 


8.5.1 | With the notation in the proof of Theorem 8.24, show that every complex 
number is contained in at most seven of the disks D,. (Hint: show that if 
z is contained in both D, and Dy witha, a’, z distinct, then a, a’ subtend 
an angle of at least 60° with respect to z.) 


9 





Algebraic methods 


In most of this book we have studied additive combinatorics problems in an ambient 
group Z, relying primarily on the additive structure of Z (as manifested for instance 
in the Fourier transform). However, in many cases the ambient group is in fact a field 
F , and thus supports a number of special functions, in particular polynomials. One 
can then use tools from algebraic geometry to exploit these polynomial structures; 
this is known as the polynomial method. One of the primary ideas here is to interpret 
an additive set (e.g. a sum set A + B) as the zero locus of one or more polynomials, 
possibly in several variables. One can then hope to control the size of such sets 
using results from algebraic geometry about the number and distribution of zeroes 
of polynomials. The most familiar example of such a theorem is the statement that 
a polynomial P(t) of one variable with degree d in a field F can have at most 
d zeroes; however for most applications we will need to study the zero locus of 
polynomial(s) in many variables. In this chapter we present four related tools and 
techniques from algebraic geometry which allow one to control such a zero locus. 
The first is the powerful combinatorial Nullstellensatz of Alon (Theorem 9.2), 
which asserts that the zero locus of a polynomial P(f,,..., t) cannot contain a 
large box Sı x --- x S,ifacertainmonomial coefficient of P is non-vanishing; this 
is particularly useful for obtaining lower bounds on the size of restricted sum sets 
and similar objects. The second is the Chevalley—-Warning theorem (Theorem 9.24), 
which shows that under certain conditions the cardinality of a zero locus of multiple 
polynomials must be a multiple of char(F’), the characteristic of the underlying 
field. This is useful for demonstrating the existence of non-trivial solutions to a set 
of polynomial equations in F. The third is Stepanov’s method (see Section 9.7), 
which obtains upper bounds on a set by using linear algebra methods to locate 
a polynomial that vanishes to very high order at each of the elements of the set; 
this has proven to be particularly useful for controlling additive combinations of 
multiplicative subgroups of a finite field, and thus has application to sum-product 
estimates. Finally we discuss divisibility criteria, which show that a polynomial 


329 


330 9 Algebraic methods 


cannot have certain types of zeroes if some combination of its coefficients are 
divisible (or not divisible) by p in a certain manner; the most well known example of 
this is Eisenstein’s criterion (Exercise 9.8.2), but the combinatorial Nullstellensatz 
can also be viewed as a statement of this type, and another example arises in 
cyclotomic fields (Lemma 9.49). As an application of these criteria we present an 
uncertainty principle for Z, which gives a Fourier-analytic proof of the Cauchy— 
Davenport inequality (Theorem 5.4). 

Much of the theory pertains to arbitrary fields F. However, we will at times 
need to focus on two special types of fields. The first are finite fields, of which the 
primary example are the fields F, = Z, of prime order. We shall review the theory 
of these fields in Section 9.4. The second are the cyclotomic fields, generated by 
pth roots of unity; we shall review the theory of those fields in Section 9.8. 

It is easy to see that in a field F, all non-zero elements have the same torsion as 
the identity element |. We refer to this torsion as the characteristic char(F) of F; 
it is either zero (if F is torsion-free) or a prime p (which is for instance the case 
when F is finite). Some of our results will only hold if the characteristic of F is 
sufficiently large (or equal to zero). 


9.1 The combinatorial Nullstellensatz 


As is well known, a polynomial P € F[t] of one variable over a field F can have 
at most deg(P) zeroes, where deg( P) denotes the degree of P. Let us rewrite this 
fact as 


Lemma 9.1 Let P € F[t] be a polynomial of one variable over a field F and 
degree d (thus the t! coefficient of P is non-zero) and let S be a subset of F such 
that |S| > deg(P). Then there exists x € S such that P(x) 4 0. 


We now present a powerful generalization of this fact to polynomials of several 
variables, namely the combinatorial Nullstellensatz of Alon [4]. 


Theorem 9.2 (Combinatorial Nullstellensatz) [4] Let F be an arbitrary field, 


let P € F[t),...,t,] be a polynomial of degree d which contains a non-zero 
coefficient at th tee tn with d, +---+d, = d, and let S,,..., S, be subsets of F 
such that |S;| > d; for all 1 <i < n. Then there exists x; € S1, ..., Xn € S, such 


that P(x1, ..., Xn) #0. 


Proof We induce on n. The case n = 1 is just Lemma 9.1. Now suppose that 
n > 2 and the claim has already been proven for n — 1. 


9.1 The combinatorial Nullstellensatz 331 


Let g,(t,) be the polynomial of one variable 


Entan) = I] (ty — Sp) = pls + lower order terms. 
SnESn 
Thus g, has degree |S,,| and the leading term is monic (i.e. it has coefficient 1). By 
applying the long division algorithm to P, we may write 


P(t, seas tn) e q4n(tı, oeeo bn)8n (tn) + Ont, TEA A] 


where the quotient qn is a polynomial of degree at most d — |S, |, and the remainder 
rn is a polynomial of degree at most d such that no monomial contains a factor of 
inl , thus 

[Sn] 


Faltis sag la) = XO raiti os noD. 


j=0 
We can expand qn 8n aS qn gll plus lower-order terms, of degree at most 
deg(qn) + [Sn] — 1 < (d = |D + [S| -1 <d = di +--+ dn. 


Thus the lower-order terms have a vanishing t” tee tn coefficient. Since |S,| > dn, 
we see that qn tl”! also has a vanishing t” - -- t% coefficient. Thus by hypothesis on 
P, the remainder r, must have a non-zero (A tee tn coefficient. In particular, rn,a, 
contains a non-zero a ee pa. coefficient. Applying the induction hypothesis, 
we can find x; € $1, ..., Xn—-1 E Sn—1 such that ry g,(%1,..., Xn—1) is non-zero. 
Applying Lemma 9.1, we can then find x, € S, such that 

[Sn] 


rn(X1,---, Xn) = ber caes Xn—1)x7 Æ 0. 


j=0 











Since g,(x,) = 0, we thus have P(x;,..., Xn) Æ 0, as desired. 





For an explanation as to the terminology “combinatorial Nullstellensatz’, see 
Exercise 9.1.3. Based on the Combinatorial Nullstelensatz, Alon, Nathanson and 
Ruzsa developed the so-called polynomial method, which is a very powerful tool 
for proving bounds concerning cardinalities of sum sets. The next several sections 
contain various applications of this method. 


Exercises 


9.1.1 (Schwartz—Zippel lemma) Let F be a field, let Q € F[t,...,t,] be a 
non-zero polynomial of n > 1 variables, and let S be a non-empty finite 
subset of F. Let x1,...,X, be elements of S chosen independently at 
random. Then 


P(O(x1, ...,%,) = 0) < men 





332 


9 Algebraic methods 


(Hint: modify the induction argument used to prove the Nullstellensatz.) 
Let F be a field, let d},...,d, > 0, and let P € F[t,,...,t,] be a non- 
zero polynomial such that every monomial that occurs in P divides 
iv! i .--¢4, Show that there exist functions fi1,..., fia,: FIT! > F 
for each 1 < i < n such that 


{(X1,..-,X4n) E F” : P(x, ...,Xn) = 0} 
n dj 
c| (iasa) Se Sep fii Os SS 
i=1 j= 
thus the zero locus of P can be covered by a small number of graphs. 
Note that when i = 1 the functions /|,; are simply constants. Conclude 
in particular that the combinatorial Nullstellensatz holds for this choice 
of P and dj,..., dy. 
[4] Let F be an arbitrary field and P € F[t,,..., t,] be a polynomial. Let 
S1,..., Sn be non-empty subsets of F and let g1,..., 8, E€ F[ti,..., tn] 
be the polynomials defined by g; (t1, ..., tn) := Ies, (ti — s)foreach 1 < 
i < n. If P vanishes on Sı x --- x Sn, show that there are polynomials 
hi, ..., hn € Fl[t,...,t,] satisfying deg h; < deg P — deg g; so that 


n 
P= X higi- 
i=1 


Moreover, the coefficients of h1, ..., An can be chosen to lie in the ring 
generated by the coefficients of P and g1, . . . , g,. Use this and the previ- 
ous exercise to provide an alternative proof of Theorem 9.2. This should 
be contrasted with the Hilbert Nullstellensatz, which asserts that given 
arbitrary polynomials P, g1,..., 2, € F[t,...,t,], with P vanishing on 
the algebraic variety determined by g1, .. . , gn, then some power P* of 
P can be written as a linear combination P* = ee higi of 81, ..., 8n- 
Let di,..., d > 0 be integers, and let F be a field whose charac- 
teristic is either zero or is greater than max(d),...,d,). Let P € 
F[ti,..., ta] be such that the t -eth coefficient is non-zero, but 
that no other non-zero monomial in P is divisible by E ee th, 
Let S,,...,8, C F be such that |S;| > d; for all 1 <i <n. Show 
that there exist xı € $1,..., Xn E S, such that P(x1,..., Xn) = 0. 
(Hint: for each 1 < i < n, construct a function g; : S; — F such that 
er (Xj yx! = I(j = d;) for all 0 < j < di. Then consider the quan- 
tity eee ae aes, P(%1,- ++» Xn)Bi(%1)- - - Sn (%n)-) 

Let F be a field and m a positive integer. Let FD" be the ring of 
functions from {0, 1}” to F, and for each i € [1, m] let x; € F!"”" be 


9.2 Restricted sum sets 333 


the coordinate functions (x1, ..., Xm) t> xi. Show that the multilinear 
monomials Hier xi, I C [1, m] constitute a basis of F!!", viewed as a 
vector space over F. (Hint: to establish linear independence, use Theorem 
9.2.) In the case F = C or F = R, show that this result also follows from 
(4.4) applied to the group Z7. 


9.2 Restricted sum sets 


We now apply the combinatorial Nullstellensatz to obtain lower bounds for sum 
sets, and restricted sum sets. We begin with a general lemma which gives a criterion 
for when such lower bounds on restricted sum sets can be attained. 


Lemma 9.3 [11] Let F bea field, letn > 1, andleth € F[t,,..., ta] be a polyno- 
mial. Let K > 0, and let Aj,..., An be additive sets in F, such that ya |A;| = 
K +n + deg(h). Suppose also that the polynomial (ti +- -- + tE h(t, ..., tn) 


: . A,|-1 An|—1 
contains a non-zero coefficient at t! alas tl “Il Then 


{ay +++: +a, 2a; € Aj forall | <i <n;hla,... an) £0} > K+1. (9.1) 


Proof Suppose for contradiction that (9.1) failed; then one can find a set B C F 
of cardinality |B| = K which contains the set in (9.1). Let P € F[t),..., tn] be 
the polynomial 


P(t, 6-5 th) = hth, oos n) | [Gi +--+ t — b). 
beB 
Observe that deg(P) = K + deg(h). On the other hand, by construction of B we see 


that P vanishes on contains A; x --- x An. But this contradicts the combinatorial 
Nullstellensatz. 














This powerful lemma allows one to reduce the task of establishing lower bounds 
on restricted sum sets to that of verifying that a single coefficient of an explicit 
polynomial is non-zero in the field F. As two quick applications of this lemma 
we reprove the Cauchy—Davenport inequality (Theorem 5.4) and then derive a 
variant, first conjectured by Erdős and Heilbronn, concerning the restricted sums 
A+B :={a+b:a€A, be B,aFb}. 


Theorem 9.4 (Cauchy—Davenport inequality, again) [47], [68] Let F = F, 
be a finite field of prime order. If A, B are two additive sets in F, then 


|A + B| > min(|A| + |B| — 1, p). 


We shall give a third proof of this theorem via the Fourier transform in 
Section 9.8. 


334 9 Algebraic methods 


Proof The claim is trivial when |A| + |B| > p (see Exercise 2.1.6) so let us take 
|A| + |B| < p. We apply Lemma 9.3 with n = 2, (A1, A2) = (A, B), h = 1, and 
K := |A| + |B] — 2; we will be done as soon as we verify that (t; + t2)* has a non- 
zero coefficient at the in Fp. But this coefficient is simply ( acy) mod p, 


which is non-zero since K < p. 














As a special case of the Cauchy—Davenport inequality we see that |A + A| > 
min(2|A| — 1, p) for any additive set A in F,. The analogous result for restricted 
sums A+A took much longer to prove. It is easy to see that |A+A| = p when 
2|A| — 3 > p (Exercise 9.2.1). In 1964, Erdős and Heilbronn (see [89]) conjec- 
tured that|A+A| > min(2|A| — 3, p); this bound is easily seen to be optimal (Exer- 
cise 9.2.3). This innocuous-seeming variant of the Cauchy—Davenport inequality 
resisted attempts at solution for about thirty years; the e-transform methods in 
Section 5.1 do not appear to be able to prove the Erdés—Heilbronn conjecture. 
The conjecture was finally solved in 1994 by da Silva and Hamidoune [66] who 
confirmed it using a general result concerning Grassman spaces. We now give a 
short proof due to Alon, Nathanson, and Ruzsa [11] using the combinatorial Null- 
stellensatz, which demonstrates the power and simplicity of this method. Indeed 
one can prove slightly more: 


Theorem 9.5 /11] Let F = F, for some prime p, and let let A, B be two additive 
sets in F. Then 


|A+B| > min(|A| + |B| — 3, p). 
Furthermore, if |A| # |B|, then we can improve the above bound to 
|A+B| > min(|A| + |B| — 2, p). 


Proof The case |A|+|B| — 2 > p is easy (Exercise 9.2.1), so suppose |A| + 
|B| — 2 < p. The cases |A| = 1 or |B| = 1 are also trivial (Exercise 9.2.2), so 
assume |A|, |B| > 2. By deleting one element from A or B if necessary it suffices 
to obtain the latter bound in the case |A| Æ |B]. 

We now apply Lemma 9.3 with n = 2, (A1, A2) = (A, B), A(t, h) := ti — b, 
and K = |A| + |B| — 3. We will be done as soon as we verify that (ti — t2) x 


(ti + t2)* contains a non-zero coefficient at ee But this quantity can be 


computed as 
|A| + |B] —3 |A] + |B] — 3 
— mod p 
|A| —2 |A|—1 
_ _(A|+|B|- 3)! 
(JA| — 2)! B| — 2)! 


Since |A| + |B| — 2 < p, we see that this quantity is non-zero, and we are done. 





(|B| — |A|) mod p. 














9.2 Restricted sum sets 335 


Clearly one can obtain further applications of Lemma 9.3; see for instance Exer- 
cise 9.2.4. But when one considers restricted sums of multiple sets one begins to 
need to study the coefficients of increasingly complicated polynomials, frequently 
involving such expressions as Vandermonde determinants. We shall therefore turn 
our attention next to the study of such polynomials and their coefficients. Our com- 
putations here shall be completely abstract, valid for indeterminates x1, ..., Xn 
taking values in any field F. 


Definition 9.6 (Vandermonde determinant) If n > 1 and x1, ..., x, are inde- 
terminates, we define the Vandermonde determinant to be the expression 


An(%1,.--5 Xn) = I] (xj — xi) = (-1)@ I] (xj — xj). 
l<i<j<n ISi<jsn 


It is easy to verify the symmetries 


An +y, 45 hn oy) = An(X1,-++5Xn)5 
An(AX1, «+ +5 AXn) = ADA, (1, aa Xn); (9.2) 
An (x)) = sgn(t)An(x) 
for any variables À, y and x = (x1,...,X,), and any permutation z € S,. In fact 
this effectively determines A,, up to constants, see Exercise 9.2.5. The quantity 
A,(l,... n) = I[( — 1)! is sometimes called the superfactorial of n. 
The following well-known fact will be left as an exercise: 


Lemma 9.7 Letn > 1, and for each 1 < i < n let P;(x) be a monic polynomial 
of degree i — 1. Then for any variables x4, . . . , X, we have the identity 


n 


det(P;(xj)isi jen = X sgn) | | Pra ai) 


TES) i=l 
= y sgn(z) I] P; (xz) 
ES), i=l 
= Anxi,- Xn). 
In particular we have 
Anli,- Xn) = ye sgn(zr) We aa (9.3) 
TES i=1 
The formula (9.3) computes the coefficients of A, (x1, . . . , Xn) exactly. Multi- 


plying it with the multinomial formula 


(xy +---+x,)* = 5 malle 


Pog 
Cty Cn > Orc) +... Aen =K Cl. n 


336 9 Algebraic methods 


we obtain the formula 


(xı Hooo H An) An (xt, aa a) 


= K! : Ci (9.4) 
= de 2, sen) Tq a@m ray LL 


CY, -Cn > 0:01 + +o,=K+m(5 ) TESn 





where we adopt the convention that 1/k! = 0 when k is a negative integer. 
In certain cases, the expression on the right-hand side of (9.4) can be simplified. 
For instance, in the m = 1 case we have 


Lemma 9.8 Letn, K > 0. Then we have 


(xy fees +t xy)E An (X15 -- <a Xn) 
K! ; ; 
= > -r Anc, Bu dawn oxe, (9.5) 


c “Che 
Cls. Cn 20:01 +--+ =K +(5 1 n 


Proof By (9.4), it suffices to establish the identity 


K! K! 
= AnlC1, «++ Cn). 
2 sen Jii- er =A! ilre! (c1 Cn) 





If we introduce the falling factorial 
(X)n := x(x — 1)--- (x —n + 1), (9.6) 


then from Lemma 9.7 we have 


| 
An(€1, «+55 Cn) = >» senco T Jedno- ES os sao T] aan 


TESy TES 











and the claim follows. 





This Lemma already gives a generalization of the Erdés—Heilbronn conjecture; 
see Exercises 9.2.9 and 9.2.10. 
In a similar spirit we have 


Lemma 9.9 Letn,m, K,k > 0 be such that 
n 
(k — +++ 0=m =K +m(5). 


Then the coefficient of xt" -Ò Pa in (x1 +--+ +.Xn)* Anri,- xh") is 
K! 


G) 
Marei eee Ae 





9.2 Restricted sum sets 337 


Proof By (9.4), it suffices to establish the identity 
K! 

sgn(zr)—— : : 

3 gn( CS ee —a@m +m)! 
K! (5) 
= - m2 A,(l,... n). 
Tf. — 1- (i — 1)m)! 

Relabeling i by (i) and using the fact that sgn(r) = sgn(z~'), the left-hand side 
can be rewritten as 








K! 
>» TA —n—1+x(i)— (i — 1)m)! 


TMESn 





which can be rewritten further using the falling factorial (9.6) as 
K! 


[l-k — 1- (i — 1m)! 2 sen) ] [ik —1-@— Dinan 


TESn i=l 





Writing n — x (i) = a(i) — 1 and noting that sgn(zr) = (—1)@sgn(a), we rewrite 
this further as 
K! 

T[i (k -— 1- (i — 1)m)! 
which by Lemma 9.7 becomes 
K! 

Ti -— 1- (i — 1m)! 


The claim now follows from (9.2). 


( D6) 5 sgn(«) [ [e —l-—(i— Dm)ai—1 


aeS, i=1 





(-D@A,(k-1,k-—1—m,...,k-1-(n—1)m). 

















As aconsequence of this computation, we have the following additive combina- 

torial consequence concerning multiple restricted addition where the restrictions 
are of the form P;(a;) # Pj(a;) for polynomials P;, P}. 
Theorem 9.10 /234] Let k, m,n be positive integers such that the quantity K := 
(k — l)n — (m + D6) is non-negative. Let F be a field whose characteristic is 
either zero or is a prime number greater than max(K, m,n — 1). Let A, ..., An be 
subsets of F for which |A;| > k — n + i forall 1 <i < n. Let Pj,..., P, € F[t] 
be monic polynomials of degree m. Then 


lar +-+- + anlai € Aj, Pi(ai) A Pia) iki A Hl>K +1. 


Proof Without loss of generality, we can assume that |A;| = k — n +i = ki. Let 
f € Flt, ...,t,] be the polynomial 


foot= [| (iG) - PG). 


l<i<j<n 


338 9 Algebraic methods 


Since k; = k — n + i we have 


YOki- 1) =k- Dn — | = K + deg( f). 


i=1 


Thus, the coefficient of pi --- th! in the polynomial (4 +--+ +t,)* x 
ft, .--,t) is the same as the coefficient of 7 see pK- in 
MHH [| (E) HH E Alt, o 0). 
l<i<j<n 


Applying Lemma 9.3 and Lemma 9.9, we reduce to showing that 
K! 

[l-k -— 1- (i — 1m)! 

But since the characteristic of F is either 0 or exceeds max(K, n — 1, m), the claim 

is easily verified. 





mOA,(1,...,n)-140. 














Next we consider what happens if we raise the factors (x; —x;) in 
A, (X1,--+, Xn) to arbitrary powers. A useful result in this regard is 


Theorem 9.11 (Dyson’s conjecture) Let ai, ...,an be positive integers. The 


coefficient of J [;—; Pie in 
I] (xj — xi)” 


i jell,n]iŻj 
is 
(ay +++ + an)! 


This result was conjectured by Dyson [74] based on a problem in particle 
physics. It was verified by Gunson [165] and independently by Wilson [383] in 
1962. We present a short and elegant proof due to Good [135]. 


Proof Letx = (x1, ..., Xn), a = (a1, ..., an) and 


a 

x; J 

F(x,a)= I] (1-=)". 
i,je[Ln}iFj xj 


and let Fo(a) denote the constant term in F(x, a). It will suffice to prove that 
Fo(a) = atten whenever the a; are non-negative integers. 

We induce on n. The claim is trivial when n = 0, so suppose n > 1 and the 
claim has already been proven for n — 1. We can then assume that none of the 
a; are zero since we can simply eliminate that variable (noting that in that case 
the x; variable only appears with a positive exponent) and apply the induction 
hypothesis. Thus a; > 1 for alli € [1, n]. 


9.2 Restricted sum sets 339 


Let e;,..., en be the standard basis vectors of Z”. It will suffice to prove the 
recursion 


Fola) = È Fola — ei) 
i=1 


whenever a; > 1, since the claim then follows from the multinomial Pascal identity 
and an easy induction on )~"_, aj. 

By applying Lagrange’s interpolation formula (Exercise 9.2.8) to the function 
f(x) = 1 we have the identity 


1 = DD I] (xj — x O — xi) 


j=l ieļl,n]iżj 


for all y. Setting y = 0 we have 


n —1 
Xi 
l= 1-— i : 
> T (-¥) Om 
j=l ie[l,n]i#j J 


By multiplying both sides of (9.7) with F(x, a), we see that if a; > O for all 
1 < j < n then we have the recursion 


Fœ,a)=_ Fœ,a— ej) 


j=l 














and the claim follows by extracting the constant coefficient. 


As one particular consequence of Theorem 9.11, we see that the coefficient 
of [JL er in ApG,- Xn)" is (nm)!/(m!)". Using this fact and some 
additional arguments, Hou and Sun [186] proved the following generalization. 


Lemma 9.12 Letn,m,k > 0, and let s := k + m(n — 1). Then the coefficient of 
Xp) in (xy +e + Xn)” An(X1, o, Xn)” is 


(CpG) 





(km!) Il (jm)! 
mD” 5) (s G Dm)! 
The proof of this theorem is somewhat technical and we refer the reader to [186] 
for details. As an additive combinatorial consequence, we can control restricted 
sum sets where the differences a; — aj are required to avoid certain specified sets. 


Theorem 9.13 /186] Let k, m,n be positive integers and F be a field of charac- 
teristic p where p is zero or p is a prime satisfying 


p 2 nmax{m,n +m — mk — 1}. 


340 9 Algebraic methods 


Let Aj,..., Ax be subsets of F with cardinality at least n. For any i,j € 
{1,...,k},i A j let Sij be a subset of F with cardinality at most m. Then the 
set 


C := {a +--+ + aglai € Aj, a; — aj ¢ Sij if i Aj} 
has cardinality at least 
IC] > (n+m—mk—-1)k+1. 


Proof We first need the following variant of Theorem 9.3, whose proof we leave 
to Exercise 9.2.11. 


Lemma 9.14 Let A,,..., Ax be finite subsets of a field F. Assume that |A;| > ni. 
Let à, u € Flt), ..., tg] be such that deg(u) > 0. Define 


C = {wlai,..-, ax)lai E€ Aj, M1, ..., ak) F O}. 


Then there is no polynomial w € F[t,,..., t,] such that the polynomial Awu! 
has degree ni — 1) and the coefficient oa - T in this polynomial 
is non-zero. 


To prove Theorem 9.13, we can assume, without loss of generality, that |A; | = n 
and |S;;| = m for all i, j. Let l := n + m = mk — 1. Assume, for contradiction, 
that |C| < In. Let à, u, œw € F[t,, ... , tg] be the polynomials 

Mt, ...,) := I] [| G-t- ci) 
sit j<k cijE€Sij 
Mh, -s tk) = ti te fk, 
w := we 
The polynomial Awp'C! has total degree mk(k-— 1)+ Ik = k(n — 1) = 
yo (Ail — 1). Moreover, the coefficient of ¢?~'---¢7~' in this polynomial is 
the same as that in 
I] (ti — t)” u = Ach, oop (hte +H. 
1<i<j<k 
But this is non-zero thanks to Lemma 9.9 and the hypotheses on the characteristic 
p. This contradicts Lemma 9.14, completing the proof. 














Exercises 

9.2.1 Let Z be any finite additive group of odd order, and let A, B be additive 
sets in Z. Show that if |A] + |B| — 2 > |Z|, then A+B = Z. (Compare 
with Exercise 2.1.6.) 


9.2.2 
9.2.3 


9.2.4 


9.2.5 


9.2.6 


9.2.7 


9.2.8 


9.2.9 


9.2.10 


9.2.11 


9.2 Restricted sum sets 341 


Verify Theorem 9.5 when |A| = 1 or |B| = 1. 

Give examples to show that the bound |A+A| > min(2|A| — 3, p) can- 
not be improved. What about the bound |A+B| > min(|A| + |B| — 2, p) 
when |A| 4 |B|? 

Let F = F, be a finite field of prime order, and let A, B be additive sets 
in Fp. Show that 


{a+ bla € A,b € B,ab £ 1}| > min(|A| + |B| — 3, p). 


Verify the symmetries (9.2). Furthermore, show that if P(x1,..., Xn) iS 
any polynomial which obeys the same symmetries (9.2) as A,,, then P is 
a scalar multiple of A}. 

Prove Lemma 9.7. (Hint: one can use Gaussian elimination to reduce to 


the case P;(x) = x'—!. Then locate several linear factors of V(x1,..., Xn) 
and use the factor theorem. Alternatively, use Exercise 9.2.5.) 

Show that if x1, . . . , x, are integers, then A,,(x1,..., Xn) is a multiple of 
A,(,...,n) = [G0 D!. 

(Lagrange interpolation formula) Let F bea field, letn > 0,letdo, ..., an 
be n + 1 distinct elements of F and let bo, ..., ba+1 be n + 1 arbitrary 


elements of F. Show that there is exactly one polynomial f € F[t] with 
coefficients in F of degree at most n such that f(a;) = b;, and that this 
polynomial is given by 


fo= 0b || a-a- aj). 
i=0  0<jAi<n 
[11] Let F = F, be a finite field of prime order, and let Aj,..., Ax 
be additive sets in F, with |Aı], ..., |Ag] all distinct and T |Ai| < 


p+ (1) — 1. Let B be the restricted sum set 


B := {a +--+ ag: a; € Aj,a; # aj forall <i < j <k}. 


Using Theorem 9.3 and Lemma 9.8, establish the inequality |B| > 
Shll (+1, ph 

[66] (Generalized Erdés—Heilbronn conjecture) Let F = F, be a finite 
field of prime order, and let A be an additive set in F,. Let k^A := 
fa, +--+ +ak:a,..., ak E A, a; Aa; foralll <i < j <k} be the 
set of k-fold sums of distinct elements of A. Show that |k* A] > 
min(p, k|A| — k? + 1). (Hint: use Exercise 9.2.9.) 

Prove Lemma 9.14. (Hint: Apply the combinatorial Nullstellensatz to the 
polynomial f := Aw [[.<c(u — c).) 


342 9 Algebraic methods 


9.3 Snevily’s conjecture 
In [322], Snevily made the following conjecture. 


Conjecture 9.15 (Snevily’s conjecture) /322] Let Z be an additive group of odd 
order and let A, B be two additive sets in Z with |A| = |B|. Then there is a bijection 
$ : A — B such that the sums {a + $(a): a € A} are all distinct. 


The general case of this conjecture remains open, but many special cases are 
known. For instance, using the combinatorial Nullstellensatz, Alon [5] showed 
that the conjecture holds for cyclic groups of prime order. 


Theorem 9.16 /5] Let F = F, where p > 2 is an odd prime and let A, B be two 
additive sets in F with |A| = |B|. Then there is a bijection @ : A —> B such that 
the sums {a + (a): a € A} are all distinct. 


Proof If A = B then one can simply choose z to be the identity map, taking 
advantage of the fact that p is odd, so we may assume A + B. In particular we 
can take |A| = |B| < p. Enumerate A = {a),..., ag}, and let P € F[t,,..., tg] 
be the polynomial 





P(t,...,%) = I] (tj — ti)X(tj — ti + a; — aj). 


l<i<j<k 


Then deg( P) = k(k — 1). Also, from Theorem 9.11, the coefficient of xf! - -- xf! 
in P is k!- 1, which is non-zero in F, since k < p. Applying the combinatorial 
Nullstellensatz, there is an s; € $; such that P(s,,..., sk) # 0. This means s; — 
si # Oands; — si + a; —a; A Oforalll <i < j < k. If we then define ọ : A > 
B by setting ġ(a;) := s; we thus see that ¢ is injective (hence surjective), and that 
the sums a; + $(a;) = a; + s; are all distinct, as desired. 














Let us notice that in the case k < p, we never used the assumption that the 
elements of A are different, and in this case one can in fact generalize to arbitrary 
fields of characteristic p or 0; see Exercise 9.3.2. Also, observe that the proof 
only used a very special case of Dyson’s conjecture. Using this conjecture in full 
generality and modifying the rest of the proof accordingly, we have the following 
more general result. 


Theorem 9.17 Let F = F, a field of prime order, let k < p, and let R,,..., Rx 
be additive sets in Fy, such that yE |R;| < p. Let ay,...,a% € Fp, and let 
B,,..., By be subsets of F, with cardinality |B;| > (k — 1)(ri + 1). Then there 
are k pairwise distinct elements {b;, ... , bx}, where b; € B;, such that the sums 
ai + bi are pairwise distinct and for every i # j, ai + bi — (aj + bj) ¢ Ri. 


9.3 Snevily’s conjecture 343 


Proof Let P € F[t),...,t,] be the polynomial 


P(t,...,t) -| I] flat =a =a] 


I<i<j<k 


x I] | [@+#-4;-1;-7). 


i jell,k]:iŻjreR; 
Then deg(P) = 5k — 1)(|R;| + 1). Also, by Theorem 9.11 the coefficient of 


(k—1)(Ril+1) - : : 
Theizj<x IT-er, x; in P is, up to sign, 


| (©ia(RI +1)! 
TTL Ril + D! 








which is non-zero in F’,, since yy (ri + 1) < p by the assumption of the theorem. 
The claim now follows from the combinatorial Nullstellensatz. 














DasGupta, Károly, Serra and Szegedy [67] obtained a multiplicative version of 
Snevily’s conjecture. Define the Vandermonde permanent Pery(x1,...,Xn) of n 
variables to be the quantity 


n 
Per, (X1,...,Xn) = > [la 

TNES, i=l 
(cf. (9.3).) 
Lemma 9.18 [67] Let F be an arbitrary field and a,,..., ax be elements of F. 
Assume that the Vandermonde permanent Per; (a1, ..., Ax) is non-zero. The for any 
subset B = {b,..., by} of F there is a permutation x € S such that the products 
Aba), - - -, Aba are all distinct. 


Proof Let f €e F[t),..., tg] be the polynomial 
F(tiy ++ 5th) = Akt, ..., tk) Aklat, .. +, agt). 


Then deg( f) < k(k — 1). Set Sı = , Sk = B. By the combinatorial Nullstel- 
k-1 kl 


lensatz, it suffices to show that the oa of ty ...t, is not zero. Notice 
that 
f.t) = ox ey I jez) (Zo ee) [oot 
TE SK TES, 


= (Zo rofa) (Zo no- e T lanoo i 


TES, TES 


344 9 Algebraic methods 


Thus the coefficient in concern is exactly 


k 
Yene I] ani = (—1)0)Per, (ay, See ak) $ 1, 
i=1 


TES, 


which is not zero due to the assumption of the lemma. The proof is thus complete. 
(For an alternative proof, see Exercise 9.3.3.) 














One can convert this multiplicative statement to an additive statement by embed- 
ding additive group as a multiplicative subgroup of a suitable field. For instance, 
one can now show that Snevily’s conjecture holds for cyclic groups of odd order: 


Corollary 9.19 [67] Let n > 1 be an odd number, and let A, B be two additive 
sets in L, such that |A| = |B|. Then there exists a bijection ¢ : A —> B such that 
the sums {a + (a): a € A} are all distinct. 


Proof We shall use the theory of finite fields, which we shall review in Section 9.4. 
Let Z% be the multiplicatively invertible elements of n, and let @(n) := |Z% | be the 
Euler totient function of n. By Cauchy’s theorem (Exercise 3.1.2), we have 2°”) = 
1 (mod n). Let F be a finite field of order 2°) and characteristic 2 (the existence 
of such a field follows from Exercise 9.4.4). From Lemma 9.22, the multiplicative 
group F* of F contains an element of order n, and hence contains a subgroup G 
isomorphic to the additive group Z,. It now suffices to verify the multiplicative 
form of Snevily’s conjecture for G. But if A = {a;, . . . , ax} is a subset of G, then 
since F has characteristic 2 one can replace permanents with determinants and 
compute 


Perlai, -s ak) = Ax(a,-...a)= [] @j-a) 40. 


1<i<j<k 











The claim now follows from Lemma 9.18. 





A variant of this argument gives a strengthened version of the above result when 
the cyclic group has order p*. 


Theorem 9.20 [67] Let p > 2 be an odd prime, let q = p“ be power of p for 
some a > l, let! <k < p, and leta,,..., a, be elements of Z,. Then for any set 
B= {b,..., bk} C Z; of cardinality k, there exists a permutation m € S, such 
that the sums aj + bra), 1 < i < k are all distinct. 


Proof We will need the machinery of cyclotomic fields, which we shall review 
in Section 9.8. Let w be a primitive qth root of unity, and let Q(w) be the asso- 
ciated cyclotomic field. Observe that Q(w) contains the multiplicative subgroup 
G := {&" : n € Z} C Q@) which is group isomorphic to the additive group Z,. 
Thus it suffices to show that for any a1, ..., ap E Gandany B = {b),..., bk} CG 


9.4 Finite fields 345 


of cardinality k, there exists a permutation x € S, such that the products a; bz) 
are all distinct. Applying Lemma 9.18, it suffices to verify that the Vandermonde 
permanent Per; (a1, ..-, ak) = Doves, m4, a is non-vanishing in Q(w). Note 
that each of the summands in this permanent is a gth root of unity, and the num- 
ber |S,| = k! of summands is not divisible by p. The claim then follows from 


Lemma 9.49. 














Exercises 


9.3.1 | Show that Conjecture 9.15 fails whenever the ambient group Z has even 
order. (Hint: first consider the case Z = Zp.) 

9.3.2 [67] Let p be a prime, let 1 < k < p, and let F bea field of characteristic 
equal to p or zero. and let a),...,a, E F. Then for any subset B = 
{b1,..., bg} of G, there is a permutation m € S such that the sums a, + 
br), a2 + bra), .--, ak + bz Ky are all different. (By Exercise 9.4.4, this 
implies that Snevily’s conjecture is true whenever G is the group Z% for 
any a > 0.) 

9.3.3 [67] Let R > 1 be a commutative ring, and let m € S be a permutation. 
Let Pa € R[w,..., Uk, V1, ..., Ug] to be the polynomial 

Py(Uy,..., Uk, V1, «++ Uk) = I] (ujUx(j) — UiUx@)- 
l<i<j<n 
Verify the identity 
XO P(r) = Alun,- ux)Per(Ui, -> Ve) 
T ESk 


and use this to derive an alternative proof of Lemma 9.18. 


9.4 Finite fields 


We now pause to develop some of the theory of finite fields. We have already 
encountered the finite fields F, = Z, of prime order, but we now discuss more 
general finite fields of composite (prime power) order. 

To avoid degeneracies we always assume that our fields have order at least 2 (so 
that0 Æ 1). Note thata finite field F is a finite additive group (F, 0, +, —), but if one 
removes the 0 element one obtains a multiplicative group (F*, 1, x, 71), where 
F* := F\{0}. Strictly speaking, a finite field has two multiplicative structures, the 
multiplicative group structure x x y forx, y € F and the Z-module structure n - x 
forn € Z, x € F coming from iterated addition, but they are clearly related by the 
identity n - x = (n - 1) x x; because of this, we shall abuse notation and identify 
n with n - 1, and also identify the two multiplicative structures. 


346 9 Algebraic methods 


The most important examples of a finite field are the cyclic groups Fp := Zp 
of prime order | F’,| = p. More generally, for any prime p and any integer k > 1, 
one can create a finite field F, of order |F| = pk (Exercise 9.4.4). Such fields 
are unique up to field isomorphism (Exercise 9.4.6). 

Because a finite field has both an additive and a multiplicative group structure, 
we will sometimes subscript certain group-theoretic concepts by addition or multi- 
plication as appropriate. For instance, we use ord, (x) to denote the additive order 
of x € F and ord, (x) to denote the multiplicative order. We now observe that all 
non-zero elements x € F* of a finite field have the same additive order ord, (x). 


Lemma 9.21 Let F be a finite field, and let p := ord,(1). Then p is prime, and 
ord, (x) = p forallx € F*. 


Proof Iford,(1) = nm is composite for some n, m > 1, then m - 1,n-1 Æ 0 but 
(n- 1) x (m- 1) = 0, which contradicts the fact that F* is a multiplicative group. 
Thus ord,(1) is equal to a prime p. Since p-x =(p-1)x x =0xx=0, we 
see that ord,(x) divides p for all x € F*; since ord;(x) Æ 1, the claim follows. 














We call the prime char(F’) := p = ord;(1) the characteristic of the finite field 
F. It is easy to see that F is now a vector space over F,,; in particular it has 
some dimension k > 1, and so |F| = p*. From Cauchy’s theorem (Exercise 3.1.2) 
applied to F* we see that ord» (x) divides |F*| = |F] — 1 forall x € F*. In other 
words, 


xlFI-l = 1 for all x € F* (9.8) 
and thus 
xF! = x forall x € F. (9.9) 


This has the following consequence. For any positive integer n, define the Euler 
totient function $(n) ofn to be the number of elements in [1, n] which are coprime 
to n (or equivalently, p(n) = |Z} |). 


Lemma 9.22 Let F be a finite field, and let n > 1 be an integer dividing |F*| = 
|F| — 1. Then we have |{x € F* : x” = 1}| = n and |{x € F* : ord, (x) = n}| = 


p(n). 


Proof Since x” — 1 has degree n, it has at most n zeroes, thus |{x € F* : x” = 


1}| < n. On the other hand, if we write |F| — 1 = nm, we see from (9.8) that y” 
lies in the set {x € F* : x” = 1} for all y € F*. Since the polynomial y” — c 
has at most m zeroes for each c € F, we thus see that |{x € F* : x” = 1}| > 


9.4 Finite fields 347 


|F*|/m = n. This gives the first claim. This implies that 


So Il € F* : ord, (x) = d}| = |{x € F% : x" = 1} 
d\n 


=n 
=9_ a) 


d\n 














and the second claim now follows from an induction argument. 


Since ¢(n) Æ 0 for all n > 1, we thus see in particular that F* contains an 
element of order |F| — 1; we call such elements primitive elements of F*. This 
implies in particular that F* is a multiplicative cyclic group of order |F| — 1. 
Another consequence is 


Lemma 9.23 Let F be a finite field. Then for any k > 1 and any hy,..., hk > 0 


such that min(h;, ... , hy) < |F| — 1, we have Y, ner Xi x = 0. 


Proof By factorizing the left-hand side, we see that it suffices to show that 
Xer X" =0 for all 0 < h < |F| — 1. When h = 0 we have X ep x" = |F|- 
1 = 0, since |F| is a multiple of the characteristic char(F). Now suppose that 
0 <h < |F| —1, and let w be any primitive element of F*. Then x > wx isa 
bijection on F, and so 


Jey = Do =! Xor. 


xeF xeF xeF 











Since w is primitive, œ + 1, and hence wees x” = 0 as claimed. 





We can now give the classical theorem of Chevalley and Warning on the number 
of solutions of a system of multi-variable polynomials over a finite field. 


Theorem 9.24 (Chevalley-Warning theorem) Let F be a finite field, letn > 1, 
and P,,..., Pm € Flti,..., tn] be polynomials such that ae deg(P;) < n. Then 


the number of solutions (x1, ..., Xn) € F” to the equations 
Pi(X1,.--,)Xn) = +++ = P(X, -.-,Xn) = 0 (9.10) 
is a multiple of char(F). 


Proof From (9.8) we have 
WPix1, eS Xn) = 0) = 1 =: Pi(x1, Se aS ee 


so the number of solutions to (9.10), thought of as an element of F, can be 
expressed as 


m 


348 9 Algebraic methods 


To prove the theorem, it thus suffices to show that 


>X [[i@=8esesa = 0. (9.11) 


Rigs XnEF i=1 


By expanding the product [];_,(1 — Pi(x1,..., Xn)!F!T!) we get a linear combi- 
nation of monomials of the form IE- 1 ca , each of which has degree at most 
yr, deg(F)(|F| — 1) < n(|F| — 1). By the pigeonhole principle this means that 
min(a},...,@m) < |F| — 1, and thus by Lemma 9.23, each monomial gives a zero 


contribution to (9.11). The claim follows. 














Since char(F’) > 2, we have the following corollary: 


Corollary 9.25 Let P,,..., Pm be asin Theorem 9.24. Then if there is one solution 
in F” to (9.10), there must also exist at least one other solution. 


Next, we give a useful lemma which shows that the zeroes of sparse polynomials 
cannot have too high a multiplicity. 


Lemma 9.26 [/80],[120] Let F be a finite field of prime order, and let P € F{t] 
be a non-zero polynomial of degree at most |F|— 1 with at most k non-zero 
coefficients. Then all the zeroes of P in F* are of order at most k — 1; in other 
words, P does not contain any factors of the form (x — xo)‘ for any xo € F*. 


Proof We prove this by induction on k. The claim is trivial if k = 1, so sup- 
pose k > 1 and the claim has already been proven for k — 1. Suppose that the 
x/ coefficient of P was non-zero. If P contained a zero of order at least k in 
F™, the (formal) derivative P’ must then contain a zero of order at least k — 1, 
and so xP’ — j P must also contain a zero of order at least k — 1. But xP’ — j P 
is a non-trivial polynomial with at most k — 1 non-zero coefficients, contradict- 
ing the induction hypothesis. Thus all the zeroes of P in F™ are of order at 
most k — 1. 














Exercises 


9.4.1 Let R be a commutative ring containing 1, and let R[t]™" be the multi- 
plicative semigroup of all monic polynomials in R[t] (polynomials with 
leading coefficient 1). We say that a monic polynomial is irreducible if 
it has no proper monic factors. Using the Euclidean algorithm, show that 
every monic polynomial can be uniquely factored into monic irreducible 
factors, up to permutations. In particular this shows that F[f] is a unique 
factorization domain whenever F is a field. 

9.4.2 Let F bea finite field. Define the von Mangoldt function A : F [t]: > 
R by setting A(f) := deg(g) if f = g* for some irreducible g and 


9.4.3 


9.4.4 


9.4.5 


9.4.6 


9.4 Finite fields 349 


some k > 1, and A( f) := 0 otherwise. Using Exercise 9.4.1, show that 
deg( f) := È ee Fle pmnie: gl f A(g) for all f € F[t]™°"°, where we use g| f 
to denote that g is a factor of f. Conclude in particular 


deg(f) _ A(f) 1 
2 |Fisdea » |F |s deen) a |F |s dee) 


fergie SEF [rt ]monic feF [t}monic 
for all s > 1. From this, conclude the prime number theorem for F[t]: 


A(f) = |F| for all k > 1. 
SEF [tm :deg( f)=k 


From this, conclude Bertrand’s postulate for F[t]: for every k > 1 there 
exists at least one irreducible monic polynomial in F[t]™°"* of degree k. 
Also, establish the Riemann hypothesis for F[t]: 


HS € FI" : deg(f) =k, f irreducible}| = |F|*/k + O(|F|K”). 


Note that this is considerably easier to establish than the corresponding 
Riemann hypothesis for Z! 

Let F be a finite field of order |F| = p* for some prime p and some 
k > 1. Let f(t) € F [t] be a polynomial over F, such that FO =t. 
Show that f(t) has exactly deg( f ) distinct zeroes in F. (Hint: if t —t= 
f (g(t), the zeroes of t? — t are the union of the zeroes of f(t) and the 
zeroes of g(t).) In the language of Galois theory, this means that every 
factor of t” — t splits completely over F. 

Let F be a finite field and k > 1 be an integer. Let f(t) € F[t] be a monic 
irreducible polynomial of degree k (which exists by Exercise 9.4.2). Show 
that the quotient ring F[t]/(f (t)) is a finite field of order | F |. Show that 
this finite field is isomorphic as an additive group only to the vector space 
F*. Note that this construction shows that there exists a field of order p* 
for any prime p and any k > 1. 

Let F be a finite field of order |F| = p* for some prime p and some 
k > 1. Let w be a primitive element of F*. Let f(t) € F [t] be the 
minimal polynomial of w over F,, i.e. the monic polynomial in F’,,[t] 
of minimal degree such that f(@) = 0. Show that deg( f) = k, and that 
the vectors 1, w,..., @*~! form a basis for F, viewed as a vector space 
over Fp. 

Let F and G be two finite fields of the same order |F| = |G] = p*. Prove 
that F and G are isomorphic. (Hint: let w be a primitive element of F *, 
and let f(t) be the minimal polynomial of w. Use Exercise 9.4.3 to find 


350 9 Algebraic methods 


ow’ € G* such that f (œ) = 0, and then find a field isomorphism between 
F and G which maps ø to w.) 

9.4.7 Let F = F, be a finite field of characteristic p, and let 6: F > F 
be the Frobenius map $(x) := x”. Show that @ is a field isomorphism. 
Furthermore, show that the iterates o, Plass gk} of this map are the 
only field isomorphisms of F to itself. 

9.4.8 Let F = Fp bea finite field, and let 1 < k’ < k. Show that the set G := 
{xe F: x” = x} is a subfield of F of order |G| = Ip“. 

9.4.9 (Wilson’s theorem) If p is a prime, show that (p — 1)! - 1 = —1 in Fy. 
(Hint: show that if x € Bes then x = x7! if and only if x = +1.) 

9.4.10 Show that Lemma 9.26 fails when F is not of prime order. (Hint: if 
|F| = p*, consider the polynomial x? — x.) 

9.4.11 Use Corollary 9.25 to give an alternative proof of Exercise 4.3.16 which 
does not use the Fourier transform. 





9.5 Davenport’s problem 


For an finite additive group Z, define the Davenport number s = s(Z) of Z to be the 
smallest integer such that whenever a1, ..., as are elements of Z (not necessarily 
distinct), there exists a partial sum pa <z 4i Of the a; for some non-empty J € [1, s] 
which sums to zero. The problem of determining s(Z) for arbitrary groups Z was 
posed by Davenport in 1966. A simple estimate is 


Lemma 9.27 If Z is a finite additive group, then s(|Z|) < |Z]. 


Proof Let a,,...,a)z, be elements of Z; it suffices to show that some non- 
trivial partial sum of these elements is zero. Consider the |Z| partial sums 
a, 4, +a2,..., a1 +-+- + az). If one of them is zero, we are done. Otherwise, 


by the pigeonhole principle there exists two such partial sums which are equal. 
Subtracting the shorter partial sum from the longer, we obtain the result. 














In 1961, Erdős, Ginzburg and Ziv [88] proved the following remarkable variant. 


Theorem 9.28 [88] Let Z be a finite additive group, and aj,...,azz\-1 be 
elements of |Z|. Then there exists I C [1,2|Z|— 1] with |I| = |Z| such that 


ei a; = 0. 


Proof Let us start with the special case when Z = Z, is a cyclic group of prime 
order. In this case we use Chavelley—Warning theorem to derive the claim. Let 


9.5 Davenport’s problem 351 


F = F, = Z, and let Pi, Po € F[t,..., t2p—1] be the polynomials 


2p-1 2p-1 


—1 —1 
Pitt, ..+5tp-1) = > ait? 5 Pati,- fap-1) = Da . 
i=l i=l 


Observe that deg(P;) + deg(P2) = 2(p — 1) < 2p — 1, and that (0,...,0) is a 
simultaneous root of P; and P2, and hence by Corollary 9.25 we can find another 
simultaneous root (y1, ..., Y2p-1) Æ O of Pı and P». But by (9.8) we see that 
DT yP = |fi e [1, 2p — 1] : yi # O}| - 1. The claim then follows by setting 
I := {ie [1,2p — 1]: y; Æ 0}. 

In the general case, we induce on |Z|. If |Z] is prime then we are already done, 
so suppose that |Z| = pm for some prime p and some | < m < |Z|. Then (using 
Corollary 3.8 if necessary) we can find a surjective homomorphism ¢ : Z > Zp 
whose kernel G := ker(@) is a subgroup of Z of order m. Since we have already 
proven the theorem for Zp, we see that for any sequence of 2p — 1 elements of 
Z, we can already obtain a subsequence of size p which lies in G. By the greedy 
algorithm, we can thus locate 2m — 1 disjoint subsets 74, ..., I2m—1 of cardinality 
pinside [1, 2|Z| — 1] such that )0;<). a; E€ Gforeach1 < j < 2m — 1. Now write 
ier, G = bj. By induction hypothesis we can find a subset J C [1, 2m — 1] of 
cardinality m such that `<; b; = 0. The claim now follows by setting J := 


Uies Ij. 


From considering the sequence 1, ..., 1 and Lemma 9.27 we see that s(Z p) = p 
for any prime p, and more generally that 


jes 














l 
sZ +++ ®Zyu) = 1+ o- (9.12) 
i=l 
for any prime p and any ki, ..., kı > 1 (see Exercise 9.5.1). 
Olson [266] proved that this bound is sharp. Let us first see this in the case 
kı =--- = kı = 1, by modifying the proof of Theorem 9.28. 


Proposition 9.29 For anyl > 1 and any prime p, we have s(Zi,) =1+1(p—1). 


Proof By (9.12) it suffices to prove the upper bound. Write F := Z,. Con- 
sider a sequence dj,..., an € F! wheren > 1+ l(p — 1). Each a; can be viewed 
as an /-dimensional vector and we write a; = (qaj1,...,q@;;). Let P},..., PE 
F[t,,...,t,] be the polynomials P;(ti,..., tn) := Yei a;jtp™' for 1< j<l; 
then ae deg(P;) = 1(p — 1) < n. Since (0, ..., 0) is a simultaneous zero of 
P,,..., Pj, we thus see from Corollary 9.25 that there must exist another simulta- 
neous zero (y1,..-, Yn) Æ (O,..., 0). Setting I := {i € [1, n] : y; 4 0}, we con- 
clude using (9.8) as before that X`,- a; = 0, as desired. 














iel 


352 9 Algebraic methods 


This simple argument does not directly extend to the general groups considered 
in (9.12); nevertheless, Olson was able to proceed by a different argument. 


Theorem 9.30 [266] Let p be a prime and k,,..., kj > 1. Then the inequality 
(9.12) in fact holds with equality. 


Proof Again it suffices to prove the upper bound. It is convenient to use mul- 
tiplicative notation. Let G be an abelian multiplicative group which is isomor- 
phic to the additive group Zp @--- ® Zp, letn > 1+ yo (pe — 1), and let 
gi, ---, g&n € G. It will suffice to find J € [1, n] such that Hier gi =i. 

Let R be the group ring of G over Z, (i.e. R is the space of formal linear 
combinations of elements of G with coefficients in Z,). In this ring we claim that 


(1 — g1)--- (1 — gn) = 0 


To see this, let x1, ..., x; be the standard basis for G, where x; has order p“ .Each gj 
can be written as the product of a few x;s. We use the identity 1 — xy = (1 — x) + 
x(1 — y) iteratively to replace 1 — g; as a linear combination (with coefficient in 
R) of the elements 1 — x;. Thus, it follows that the product (1 — g1)---(1 — gn) is 
a linear PONOR of elements of the form Tl ,;d — 7i J” where y: ni =n> 
ea 1¢ p“ — 1). T must be some j such that n; > p* . On the other hand, in R, 
(1 -x = =1-x7 ” = 0. It follows that (1 — g1)--- (1 — gn) = 0, as claimed. 
This implies that for some non-trivial subsequence of the g; has product 1, because 
otherwise, the coefficient of 1 in the product (1 — g1) --- (1 — gn) would be non- 
zero. This proves Theorem 9.30. 














This allows us to prove variants of Theorem 9.28 for product groups. For 
instance: 


Lemma 9.31 [266] Let Z := Zz, where p is a prime. For any sequence 
a,..+,3p—2 € Z, one can find a subsequence of length at most p whose sum 
is zero. 


Proof Embed Zin Z’ := Z;, anda sequence x + d,..., X + a3p—2, where x isan 
element of Z’\Z. By Theorem 9.30 (or Proposition 9.29) we have s(Z’) = 3p — 3, 
and thus some subsequence of x + a4, ...,X + 43p—3 has sum zero. Rearranging 
subscripts, we may assume that (x + a,)+---+(x +a,) = 0, where 1 < n < 
3p — 3. This implies that nx = 0 and gı +--- +g, = 0. It follows that n = p 
or n = 2p. If n = p then we are done. If n = 2p, we apply Theorem 9.30 or 
Proposition 9.29 again, this time to the group Z. As s(Z) = 2(p — 1), the sequence 
aj,...,@y,—1 contains a subsequence whose sum is zero. Again by rearranging 
subscripts, we may assume that g1 + ---+ gm = 0 where m < n — 1. Ifm < p 


9.5 Davenport’s problem 353 


then we are done. If m > p, then the sequence g41,..., 8m has length less than 











p and its sum is also zero since g; +---+ g, = 0. The proof is complete. 





By this Lemma and an induction argument similar to that used to prove Theo- 
rem 9.28 one can then obtain the following estimate on the Davenport number of 
product groups: 


Theorem 9.32 [266] Let Z and W be additive groups such that |W | divides |Z|. 
Then s(Z ® W) < |Z|+|W|-1. 


We leave the proof of this theorem to Exercise 9.5.2. 

Finally, let us briefly discuss the version of Davenport’s problem when the 
elements in the sequence are different. Under this condition, the magnitude of the 
Davenport number changes dramatically. Szemerédi [347] proved 


Theorem 9.33 There is a constant c such that the following holds. Let S = 
{a,,..., ds} be a sequence of s different elements of Zp, where p is a prime 
and s > c/p. Then there is a non-empty subsequence of S whose elements sum 
up to zero. 


A more recent result of Hamidoune and Zemor [175] showed that one can set 
c = 2+ o(1), which is asymptocially best possible. 

Assume that A C Z, does not contain 0 and view the elements of A as integers 
between 1 and p — 1. It is clear that if }°,.4 a < p then no subset of A sums 
up to 0. In [349, 352], Szemerédi and Vu showed that if A has sufficiently many 
elements, then this is essentially the only reason. 


Theorem 9.34 Let A be a subset of Z ,, where p is a large prime. Assume that no 


9.49 elements 


subset of A sums up to 0. Then there is a subset A' of A with at most p 
and a non-zero element x € Z, such that the sum of the elements in x - (A\A’) 


(viewed as positive integers between 1 and p — 1) is less than p. 


For another classification result of this kind, see Theorem 12.20. The approach 
to these two results relies on inverse arguments, in spirit of those discussed in 
Chapter 12. 


Exercises 


9.5.1 Prove (9.12). 

9.5.2 By modifying the inductive argument in the proof of Theorem 9.28, 
deduce Theorem 9.32 from Lemma 9.31. 

9.5.3 Letn be a positive integer, and let Z[i] be the ring of Gaussian integers. 
Show that a sequence of 2n — 1 Gaussian integers contains a subsequence 
of length n whose sum is divisible by n. 


354 9 Algebraic methods 


9.5.4 Let Z be an additive group of order n and let k be a positive integer 
divisible by n. Prove that for any sequence of elements of Z of length 
k +n — 1 there is a subsequence of length divisible by h whose sum is 
0. 

9.5.5 [6] Let p be a prime, and let vj, ..., v3p € z7, be such that $- 
Then there is a subset J C [1, 3p] such that |J| = p and >> 
(Hint: modify the argument in Lemma 9.31.) 


3p 


i= 


,u =9. 


jes Uj = 0. 


9.6 Kemnitz’s conjecture 


Define a parameter s(n, d) as the smallest integer s such that any sequence of 
s elements from Zf contains a subsequence of length n whose sum is 0 in Z£. 
In this terminology, Theorem 9.28 states that s(n, 1) = 2n — 1. Harborth [176] 
considered the problem of controlling s(n, d) for higher d. He first observed the 
easy estimates 


(n — 1)24 +1 <s(n,d) < (n— 1)nf +1 (9.13) 
and also derived a recursive inequality 
s(mk, d) < s(n, d) + m(s(k, d) — 1); (9.14) 


we leave the proofs as exercises. 

Exact computation of s(n, d) is a difficult task, especially for large d; for 
instance the quantity s(3, d) is closely related to the still unsolved problem of 
obtaining sharp constants for Roth’s theorem in Zz (see Exercise 10.2.4). But in 
the case when d is fixed and n is large, more is known. Alon and Doubnier [6] 
proved that s(n, d) = Og(n). Kemnitz conjectured that the lower bound in (9.13) 
is sharp for d = 2: 


Conjecture 9.35 [200] For any n > 1, we have s(n, 2) = 4n — 3. 


In [200] the conjecture was verified when the prime factors of n are from 
the set {2, 3, 5, 7}. Alon and Doubnier [6] proved that s(n, 2) < 6n — 5. They also 
sketched an argument which gives s(p, 2) < 5p — 2 for all sufficiently large prime 


p. 
Rónyai [286] made a significant progress by proving 


Theorem 9.36 [286] For every prime p we have s(p, 2) < 4p — 2. 
Theorem 9.36 and (9.14) imply that s(n, 2) < Hn; see Exercise 9.6.3. 


Proof The case p = 2 is trivial so we can assume that p is odd. Set m := 4p — 2, 
and let vı = (a1, b1), .. -, Un = (An, bm). By Exercise 9.5.5, it suffices to show that 


9.6 Kemnitz’s conjecture 355 


there is a subset J C {1,...,m} with |J| = p or |J| = 3p such that }` jey vj = 0. 
Assume, for contradiction, that there is no such J. Let ø, Pi, Po € Fy(th, ..., tn) 
be the polynomials 


I: 


IC[1,m],|I|=p ie! 


i p-l ii p-l 
Pity, sss tm) = (0 (1 So] — 1 
i=1 i=1 


a(t, areln) : 


m pol 
x 1- ($) (2—=o0lti,..., tm) 
i=l 
P(t, e...’ tn) = [a = ti) 
i—l 


and set P := P, — 2 P2. 

We now claim that P(x, ..., Xm) = O whenever x1, ..., Xm € {0,1} C Fp. 
There are several cases to consider, depending on the size of the set J := {i € 
[1, m] : x; = 1}. When J is empty then it is easy to see that Pi (x1, ..., Xm) = 2 
and P2(x1,..-, Xm) = 1. When J is non-empty, then P2(x1, ..., Xm) is zero. To see 
that Pi (xı, ..., Xm) is also zero, we observe from (9.8) that 1 — (St x)?! =0 
when |J] is not divisible by p, and from Wilson’s theorem (Exercise 9.4.9) 
that o(%1,...,Xm) = 2 when |J| = 2p. Finally, when |J| = p or |J| = 3p, 
we have by hypothesis that }7”"_,(a;, bi)xi = $ je; (ai, bi) # 0, and hence (1 — 
ee xi)? TA — EL, bixi) — 1) = 0. 

Thus P vanishes on {0,1} x--- x {0,1}. Also, deg(P;) =4p—3 and 
deg(P2) = m = 4p — 2, thus deg(P) = 4p — 2. Moreover, the coefficient of the 
monomial x; ---Xm in P is 2(—1)” - 1 Æ 0. This contradicts the combinatorial 
Nullstellensatz and concludes the proof. 














Remark 9.37 In the above proof one only used a very special case of the combi- 
natorial Nullstellensatz; indeed one could rely just on Exercise 9.1.5, which can 
be proven by more elementary means — in fact, this was the original approach in 
[286]. 


Remark 9.38 Very recently, Reiher [279] has proved Kemnitz’s conjecture, using 
the Chavelley—Warning theorem combining with a clever combinatorial argument. 


For a further survey of results in this area, see [81]. 


356 9 Algebraic methods 


Exercises 


9.6.1 [176] Prove (9.13). (Hints: for the upper bound, use the pigeonhole prin- 
ciple. For the lower bound, take n — 1 copies of {0, 1}4,) 

9.6.2 [176] Prove (9.14). (Hint: modify the inductive argument in the proof of 
Theorem 9.28.) 

9.6.3 Using Theorem 9.36 and (9.14) to deduce that s(n, 2) < an. (Hint: first 
verify the claim when all the prime divisors of n are less than 11, and 
then induce on n.) 


9.6.4 [6] Modify the proof of Lemma 9.31 to prove that s(n, d) = Og(n). 


9.7 Stepanov’s method 


In this section we fix a finite field F, and fix a multiplicative subgroup G of F*. 
The multiplicative structure of G can be determined explicitly: 


Lemma 9.39 Let G be a subgroup of F*. Then |G| divides |F*|; thus we 
have |F*| = |F| — 1 = |G\h for some h > 1. Furthermore we have the explicit 
formulas 


G = fx € F* x = 1} Si" : y € F*}, (9.15) 


and if G+ C F* denotes the orthogonal complement group G+ := {£ € F* : 
&" = 1}, then G+ indexes the multiplicative cosets x - G of G. Indeed if we define 
Ge := {x € F* : x!Cl = £} forall € GŁ, then the sets {Gg : E € G+} partition 
F*, and one has x - G = Gyia forall x € F*. 


We leave the easy verification of this lemma to Exercise 9.7.1. In this section 
however we shall be more concerned with understanding the additive structure of 
G. A convenient way of quantifying this structure is via the sets A (£) C F defined 
for all £ € G+ by 


AG) := fx € F : x°! = (x — 1)! = E} = G; N (G; + 1). 


It is clear that these sets are disjoint as £ ranges over G+. The relevance of these 
sets to the additive structure of G lies in the easily verified identity 








IGN(G+x)|=|(G—g)N +G;|=|A(E“')| whenever £ € G+, x € Gz, g € G; 

(9.16) 
see Exercise 9.7.2. As a consequence of (9.16) we have the following identities, 
whose verification we leave to Exercise 9.7.3. 


9.7 Stepanov’s method 357 


Lemma 9.40 We have D ecc |A(E)|=| Usec: A(E)| = |G| — land E(G, G) = 
IGI? + |G| Deco: |A(é)|?, where E(G, G) is the additive energy of G. If —1 € 
G+, then we have |A(—é)| = |A(é)| for all £ € G+. 


In [337] Stepanov introduced a method for controlling various additive expres- 
sions involving G and related objects such as | A(&)|. For simplicity we shall restrict 
our attention just to the task of obtaining upper bounds on | A(&)|, following [180]. 
The idea is to use elementary linear algebra to construct a sparse polynomial P 
which vanishes to high order on several of the sets A(€). One then applies tools 
such as Lemma 9.26 to obtain a non-trivial bound. We illustrate this method with 
the following result of Heath-Brown and Konyagin, which gives distributional 
information on the sizes of the | A(é)|. 


Theorem 9.41 /180] Let F = F, be a finite field of prime order, and let G be a 
multiplicative subgroup of F*. Let G+ and A be defined as above. Then for any 
set T C Gt with |T] = O(|F|?/|G|*), we have 


YT IA6)| = O( min (IGI, |G? |0 7"). 

Eer 
Proof Let 0 < c <« 1 bea small absolute constant to be chosen later. We may 
assume that G is large, |G| > c~!©°, since the claim is trivial otherwise. Similarly 
we may assume that T is non-empty and that |T| < ¢!|F|>/|G|*, since the claim 
for |T| = @(|F|>/|G|*) then follows by partitioning F into O(1) sets of size at 
most c!| F|3/|G|4. 

When |I| = Q(|G|!/7) then the claim already follows from Lemma 9.40, so we 

may assume that |T| < c!”|G|!/. Let us define the normalized quantities 


A = [PGPP]; B = La GAS]; 
observe from our hypotheses on |I | that we have the bounds 
1<B<A; AB<|G|; AIIT|<cAB?;, A+2|G|B<|F| (9.17) 


if c is chosen suitably small. By the disjointness of the A(&), it then suffices to 
show that 





a |G|B 
Ure] = o(1+ £E), (9.18) 


Eer 


We now let V C F[t] be the linear subspace (over F) of F[t] generated by the 
AB? polynomials t41°'@l(¢ — 1)”'¢l where 0 < a < A and0 < b, b' < B. We first 
observe that V has large dimension: 


Lemma 9.42 V has linear dimension exactly A B? over F. 


358 9 Algebraic methods 


Proof Suppose for contradiction that V had dimension less than A B?. Then we 
could find coefficients ca p,» € F, not all zero, such that 


XO capp ta- Shao 


O<a<A 0<b<B 0<b'<B 


We may assume that there is at least one non-zero coefficient cg4,9, oth- 
erwise we could divide out by (t — 1)!°!, But then the polynomial }°y—,-4 
X o-p-p Cab 0tt!C! would have a zero of order |G] at t = 1. On the other hand, 
this polynomial is non-zero and its Newton diagram contains at most AB points, 
which contradicts Lemma 9.26 and (9.17). 














We then exploit this large dimension to locate a polynomial which vanishes to 
high order on User Aé). 


Lemma 9.43 V contains a non-zero polynomial P which vanishes to order A at 
all elements of User A(é). 


Proof It is convenient to use an algebraic geometry perspective and work via 
commutative rings. Let R be the commutative ring over F generated by indeter- 
minates t, t~', s, s~!, r, € subject to the constraints 





fai Sle s=t-l; H =s oe ee e4 = 0; 
Eer 
(9.19) 


in other words, R is the polynomial ring F[t, t~', s, s7!, r, £] quotiented out by 
the ideal generated by the polynomials tt™! — 1, ss7! — 1, s — t + 1, t!Cl—r, 
sll — r, Iker — £), and £^. Letı : F[t] > R be the ring homomorphism that 
maps t to t +£. We shall show that the image (V) of V has linear dimension 
strictly less than A B?. By Lemma 9.42, this will force the existence of a non-zero 
polynomial P € V such that (P) = 0; in other words we can find Q1, ..., Q7 € 
F[t, t}, s,s~',r, £] such that 


P(t +e) = Qtt — 1) + Qx(ss7' — 1) + Q3(s — t +1) 
+ Qalet! — r) + Q5(s'@ — 4) + Q6 | [C -E + Q724 


Eer 
for any indeterminates t, t~}, s, s7!, r, e. Restricting this to r := £ € I, t := x € 
AG) C FX ,s:=x— 1e F*, t! =x! e F*, st:=(- ly te F*,e€ 
F, we obtain 





P(x +e) = Q(x, x7!, x — 1, (x — 1)!, £, e)e4 


which shows that P vanishes to order A at x, which is an arbitrary element of 


User A). 


9.7 Stepanov’s method 359 


Itremains to bound the linear dimension of (V ). Observe that this space is gener- 
ated by the polynomials o(¢t2'@l(¢ — 1)9'@!) = (t + £)" (t + eIFN(s + £)}"Sl. But 
by the Taylor expansion of (t + ¢)’!! and using the constraints (9.19), we have 


b|G b|G 
(t + e)PIGl = f216! (1+( i rte ( : rte +--+) 
b|G|\ __ bIG| \ _ 7 
-fi REN pati pA- 
r ( + 1 Ete + Goi E 


In particular we see that (t + ¢)’!¢! is equal in R to a polynomial expression in 
t,t-',s,s—',r,e€ of degree O(A). Similarly for (t + £) and (s + ¢)’'!. Thus 
L(V) lies in the space of polynomials in t, tl. s,s7 |, re of degree at most O(A). 
Taking out a common denominator of (ts)~?“, we obtain a space of polynomials 
int, s,r, € of degree at most O(A). The variable s can be eliminated since s = t — 1 
from (9.19). The variable r is limited to have degree at most |A|, again by (9.19). 
This shows that the dimension of (V) is at most O(|A|A*), which (9.17) is indeed 
less than the dimension A B? of V, as desired. 














Let P be as in Lemma 9.43. Since P € V, wehavedeg(P) < A+ 2|G|B < |F 
thanks to (9.17). Since P can have at most deg(P) zeroes (counting multiplicity) 
in F, we obtain 


A <A+2/G|B, 





UAG) 


Eer 














which gives (9.18) as desired. 





Theorem 9.41 can already be used to give non-trivial sum set bounds on G, for 
instance via controlling the additive energy E(G, G). In fact we can also control 
the additive energy E(A, A) of subsets of G: 


Lemma 9.44 [44] Let F = F, be a finite field of prime order, and let G be a 
multiplicative subgroup of F* of order |G| = O(|F|*/). Let A be an additive set 
in G. Then we have 


E(A, A) = O(|G||A/*”). (9.20) 


Comparing this with (2.7) we see that this bound is non-trivial when |A| > 
|G|?/3. See also Corollary 2.62. 


Proof For every £ € G+, we define the counting function a(&) by 


a(&) := |{(a1, az) € A x Ata, — a € Gg}. 


360 9 Algebraic methods 


We observe that 





E(A, A) = {(a1, 42,43,a4) E AX AX AX Asa, — a = a — a4}| 

= AP + Da l{(a1, a2, a3,a4) E€ AX AX AX Ata, — a = a3 — d4 € Gz}| 
éeGt 

<|AP + $` aé) sup I{(g1, 82) € GX G : 81 — 2 = all 

L dceG 

EEG 

=JAP+ Ý OAD] 
éeGt 


thanks to (9.16). Since |A|? = |A|!/?|A|?/* = O(|G||A|?”’), it thus suffices to show 
that 


X aE) AE) = O(IGIA”). 
éeGt 
From the identity 
X a) = [Al 
éeGt 
we see that it suffices to show that 
5 aA ETDI = O(|GIIA]?”). 
geGŁ:AED>|GI| A| 


But from (9.16) we also have the trivial bound 


alé) < |A] sup |{g2 € G : g1 — 2 € Ge}| = IAIA ETD] 
g&ıEG 


and so it suffices to show that 


IAETDI? = O(IGIJA'”). 
EEG+:A(E-!)>|GI|A|- 1/2 


But if we order Gt = {&,..., Em} in decreasing order of A(T’), then by Theo- 
rem 9.41 we then have 


JAE ')| = O(min (|G|, |G)? 77)) for all 1 < j < M, 
which implies that 


Yoo agds YO o(ere7??/i) = ola.) 


&eGt:AE)2 GIA? j=0(1A|°2/1G)) 











as desired. 





As a consequence we can now give a sum-product estimate which improves 
somewhat on the results in Section 2.8. 


9.7 Stepanov’s method 361 


Theorem 9.45 [44] Let F = F, pe a cael field of prime order, and let A be an 
additive set in F*. Let Q[A] = a aan be the quotient set of A, as defined in 
Definition 2.49. Then there exists E€ € Q[A] such that 


ape al? 
Al’ |A- Al 











JA+6-AL> emin (171, > A 





for either choice of sign +. 


Proof If |A| > |F|! then the claim follows from Corollary 2.51, so suppose 
|A| < |F|'/. Let D be the set of popular quotients, 





2\|Al? 
= fa e F* iMa") e Ax A:a'ja' = d}| > |A| r 


9|A -A| 
and let G be the multiplicative group generated by D. Then by the multiplicative 


version of Exercise 2.6.10, there exists a coset &) - G of G for some & € F* such 
that |A N (£o - G)| > |A|/3. By dividing A by & we may assume that & = 1. 


Lemma 9.46 Let H C G be the set of those £ € G such that 


IAPIGI  2|A} ) 
|A|? + IGI” 9JA -A| 





|A+&-Al| > min ( 
Then H  Q[A] is non-empty. 


Proof Suppose for contradiction that H and Q[A] are disjoint. From Exer- 
cise 2.8.4 there exists a € € G such that |A + €- A| > he, and hence H is 
non-empty. Thus, G\ Q[A] is non-empty, and is also a proper subset of G (since 
1 € Q[A] NG). Next, observe that if £ € G\Q[A] and d € D, then by Lemma 


2.50, all the sums in A + & - A are distinct, and hence 
IA}? 
O|A- Al 
This shows that D - (G\ Q[A]) C H. Since H C G and H and Q[A] are disjoint, 
we conclude D - (G\ Q[A]) C G\Q[A]; since D generates G, this implies that G - 


(G\ Q[A]) E G\Q[A]. But this contradicts the previous observation that G\ Q[A] 
was a proper non-empty subset of G. 





|A + (d); A| > |A||AN@- A)| > 














Let £ be as in the above lemma; thus 





Al? 
|A +£- A| > cmin| |G], |Al’, [Al f 
|A - Al 


Note that since |A - A| > |A|, we can drop the |A|? term from the right-hand 
side. We will now be done unless |G| < c|A|>/?/|A + A| for some small c > 0. 





362 9 Algebraic methods 








Since |A + A| > |A] and |A] < |F]!/2, wehave |G] < cf45 < clF |’. But then, 


from Theorem 9.41, (2.8) and the fact that |A N G| > |A|/3 we see that if |G| = 
O(|F|?/4), then 





|A+A| > (ANG) +(ANG)| 
|ANG|* 
> ¢ 
= E(ANG, ANG) 
> clANG)>?/|G| 
> cl|Al>?/|GI, 




















a contradiction. 


Exercises 


9.7.1 Prove Lemma 9.39. (Hint: in Section 9.4 it was demonstrated that F* is 
acyclic group of order |F*| = |F| — 1.) 

9.7.2 Prove (9.16). 

9.7.3 Prove Lemma 9.40. (Hint: use (9.16) and Lemma 2.9.) 

9.7.4 [44] Let F = F, be a finite field of prime order, and let A be an addi- 
tive set in F* such that |A| < |F|!/?. Using Theorem 9.45, prove that 
|A-(A—A)+A-(A—A)| = Q(A]>4). Use this to derive another 
proof of Corollary 2.58. 


9.8 Cyclotomic fields, and the uncertainty principle 


We now recall some of the elementary theory of cyclotomic fields Q(w), and apply 
this to obtain an uncertainty principle for the Fourier transform on Z,. 


Definition 9.47 (Cyclotomic field) Letn > 1 be any positive integer. An nth root 
of unity is any complex number w € C such that œ” = 1. An nth root of unity w is 
said to be primitive if w is not an mth root of unity for any 1 < m < n. We define the 
cyclotomic field of order n to be the field Q(@) obtained by adjoining a primitive 
nth root of unity to the rationals Q. We define the nth cyclotomic polynomial 
®, € C[z] to be the polynomial ®,(z) := [],,(z — œw), where w ranges over the 
primitive nth roots of unity. 


It is easy to see that for each n, there are ¢(7) primitive roots of unity, and they 
are all powers of each other. Thus there is only one cyclotomic field Q(w) for each 
order n. In particular we see that ®, is a monic polynomial of degree ¢(n). Some 
further basic properties of ®, as follows. 


9.8 Cyclotomic fields, and the uncertainty principle 363 


Lemma 9.48 ®, has integer coefficients (thus ®, € Z[z]), and is irreducible in 
Z[z]. Furthermore we have ®,(1) = p when n is a prime power n = p*, ®\(1) = 
0, and ®, (1) = 1 otherwise. 


Proof We first observe from the factor theorem that 


l= I] (z — w) for anyn > 1. 


aio"=1 


Since every nth root of unity is a primitive dth root of unity for some d, we obtain 


2"-1=]] baz). (9.21) 

d\n 
Thus one can obtain ®,, (z) by factoring out [ ]4y),.y<, Pa(z) from z” — 1. By an easy 
induction on n this implies that ®, is a monic polynomial with integer coefficients. 
Since (z” — 1)/®,(z) = (z” — 1)/(z — 1) approaches n as z > 1, we obtain the 


formula 
n= I] ®,(1). 
d|n;d>1 


Taking logarithms and using Exercise 9.4.1 we conclude that 


X Ad) = > log ba(1) 


d\|n;d>1 d|n;d>1 


for all n > 1, where A(d) := log p when d is a prime power d = p* for some 
k > 1, and A(d) = 0 otherwise (cf. Exercise 1.10.6). Another easy induction on n 
then shows that ®,(1) = e4™ for all n > 1, which gives the desired formula for 
®,,(1). 

Now we prove the irreducibility. When n is prime this can be easily verified from 
Eisenstein’s criterion (Exercise 9.8.3), but the general case is trickier. We use an 
argument of Gauss. Suppose for contradiction that ®, is reducible in Z[z], then we 
can partition the primitive nth roots of unity into two disjoint non-empty classes 
A and B such that the monic polynomials f(z) := Tue a(z — œ) and g(z) := 
Tl]oeg(z — o) lie in f, g € Z[z]. Of course we have ®, = fg. Since any two 
primitive nth roots are powers of each other, we can find anw € Asuchthatw” € B 
for some integer m. By decomposing m into primes and arguing by contradiction, 
we can in fact locate a prime p and an w € A such that œP € B. This implies that 
the polynomials f(z) and g(z’) have a common root, and hence by the Euclidean 
algorithm we can find a non-trivial monic polynomial h(z) € Z[z] which divides 
both f(z) and g(z?). This implies that ®,(z?) = f(z?)g(z?) contains a factor of 
h(z?)h(z); by (9.21) we see that z”? — 1 also contains a factor of h(z?)h(z). 

Now we work in the finite field F,. In that setting we have h(z?) = A(z)? and 
(z” — 1)? = z” — 1 (cf. Exercise 9.4.7) and hence (z” — 1)? contains a factor of 


364 9 Algebraic methods 


h(z)?*!; in particular z” — 1 must contain a factor of h(z)? in F p (cf. Exercise 9.4.1). 
Taking formal derivatives, this implies that z” — 1 andnz’—! have a common factor 
of h(z); but from the Euclidean algorithm and the fact that n Æ 0 (mod p) we see 
that these polynomials have a least common multiple of 1, contradiction. 














As aconsequence of Lemma 9.48 we obtain a useful criterion for non-vanishing 
of polynomial expressions of roots of unity, which was already exploited in the 
proof of Theorem 9.20. 


Lemma 9.49 Let p be a prime and q be a power of p. Let P € Z[ty,..., tg] be 
a polynomial with integer coefficients such that P(z,,..., zg) = 0 for some qth 
roots of unity Z,,..., Z%. Then the integer P(1, ..., 1) is divisible by p. 


Proof Let œw be a primitive qth root of unity, then z; = œ"! for some integers 
ni. If we let Q(t) := P(t™,...,¢"*), then Q(w) = 0. Thus Q(t) shares a root in 
common with the irreducible polynomial ®,(¢), which must then be a factor of 
Q(t). Thus Q(1) = P(1,..., 1) has ®,(1) = p as a factor. 














We apply this lemma to prove a non-vanishing result on generalized Vander- 
monde determinants. We first need a coefficient computation. 


Proposition 9.50 /355] Let nı,..., ną be non-negative integers, and let P € 
Z[z1,---, Zk] be the polynomial 


k 


P(,..-52) = a sen(zr) | |z; 


TES, i=l 


(cf. (9.3)). Then we can factor P = AQ, where Q € Z[z1,..., zk] is such that 


OU, ...,1) = Ag(m, ...,74)/Ag(,..., 8). 


Proof The expression P(zj,..., z) can also be interpreted as the determinant of 
thek x k matrix Ge )i<i,;<k- This shows in particular that P vanishes when any two 
of the z; are equal. Dividing out the factors of z; — z j using long division and apply- 
ing Definition 9.6 we conclude the existence of a polynomial Q € Z[z1,..., zx] 
such that P = A; Q. It remains to compute Q(1,..., 1). To do this we introduce 
the normalized differentiation operators D; := z; rae and consider the expression 


D°D}... D{~'P(, ..., 1). We split P into factors 
PGi t= |] Gp Ha) * 2G,- z 
l<i<j<k 


and apply the Leibniz rule D;( fg) = (D; f)g + f(Dig) repeatedly. Observe that 
there are (5) linear factors in the expression to be differentiated, all of which 
vanish at (1,..., 1). There are also (5) derivatives to be applied. Thus the only 


9.8 Cyclotomic fields, and the uncertainty principle 365 


terms in the Leibniz rule which do not vanish at (1, ..., 1) are those in which all 
the derivatives land on the linear factors. Furthermore each derivative must land 
on a distinct linear factor to yield a non-zero term. But this means that each of the 
D, derivatives must land on one of the zg — zi factors with i < k (and there are 
(k — 1)! ways this can happen); similarly the D,_; derivatives must then land on 
one of the z,_; — z; factors with i < k — 1 (with (k — 2)! ways this can happen), 
and so forth. We conclude that 


DD}... DE" P(,...,1)=(k—D!--- 10!0d,..., 1) = Ag, ..., NOU,..., 1). 
On the other hand, since each monomial zi tee ee is an eigenfunction of D; with 


eigenvalue n;, we see from definition of P that 


k 
DÎDL... DEP, Jae Se) = 5 sgn(s) Din e; 


WES, i=l 


Substituting zı = --- = z = 1 and applying (9.3) we obtain 


D? DL... DE" Pai, .. +) = Antti, ..., m4). 











Combining this with the previous identity, the claim follows. 





Combining Proposition 9.50 with Lemma 9.49 we obtain 


Lemma 9.51 (Chebotarev’s lemma) Let q = p“ be a prime power, let 1 < k < 
p, and let z,..., zg be distinct qth roots of unity. Letn,,..., ng be integers which 
are distinct modulo p. Then the k x k matrix (z Jı<i, j<k has non-zero determinant. 


Indeed, Chebotarev’s lemma follows since Ag(z1,...,Zzķg) is non-zero and 
Az(n1, ..., ng) is not divisible by p. We note that while this result was proved 
by Chebotarev in 1926 (see [338]), it has been independently rediscovered and 
reproved a number of times [278], [71], [263], [102], [355], [120], [131]. As a 
consequence of this lemma, one easily establishes the following uncertainty prin- 
ciple for Z,: 


Theorem 9.52 [355] Let p be a prime number. Let f : Z, —> C be a random 
variable, and let f : Z p — C be its Fourier transform (using the standard bichar- 
acter e(x,&) = exp(2wixé/p). Then we have |supp(f)| + |supp(f)| > p +1 
Conversely, if A and B are two non-empty subsets of Z/pZ such that |A| + |B| > 
p +1, then there exists a function f such that supp(f) = A and supp(f) = B. 


We leave the deduction of Theorem 9.52 from Lemma 9.51 to Exercise 9.8.9. 
This result should be compared with (4.21). As an application of this theorem 
we give yet another proof of the Cauchy—Davenport inequality, this proof being 
Fourier-analytic (or more precisely Fourier-algebraic) in nature. 


366 9 Algebraic methods 


Theorem 9.53 (Cauchy—Davenport inequality, yet again) Let F = F, be a 
finite field of prime order. If A, B are two additive sets in F, then 


|A + B| > min(|A| + |B| — 1, p). 


Proof ([355] and Robin Chapman, private communication) Since A and B are 
non-empty, we may find two subsets X and Y of Z/pZ such that |X| = p+ 1 — 
|A|, IY| = p + 1 — |B|, and |X N Y| = max(|X| + |Y| — p, 1). By Theorem 9.52 
we may find a function f such that supp( f) = A and supp(f) = X, and a function 
g such that supp(g) = B and supp(g) = Y. Then f * g has support contained in 
A + B and has Fourier support equal to X N Y (in particular, f x g is non-zero), 
and hence by Theorem 9.52 again we have |A + B| +|XMY| > p+ 1, which 
gives |A + B| > max(|A| + |B| — 1, p) as desired. 














One can iterate Theorem 9.52 to also apply to the group Z}, for any n > 1, 
which we endow with the standard bilinear form, as in Example 4.2. 


Corollary 9.54 [249] Let p be a prime, n > 1 be an integer, and f : Zi, > C be 
a non-zero random variable. Then we have 


n—k—1 1 


p*|supp(f)| + p” ““'|supp(f)| = p” + p" 


forall0<k<n-1. 


Remark 9.55 These bounds can be seen to be sharp in a large number of sit- 
uations, by taking the Cartesian product of the examples in Theorem 9.52 with 
subgroups of Z,. It has a nice geometric interpretation: if one plots the point 
(|supp(f)|, [supp(/)|) in Z x Z, then this point lies on or above the convex hull 
of the points (p/, p”~/) for 0 < j < n, which correspond to the cases where f is 
the indicator function of a subgroup of Z’; this convex hull should be contrasted 
with the hyperbola corresponding to (4.21). In [249], this result was generalized 


further to arbitrary finite additive groups Z, see Exercise 9.8.11. 


Proof We prove this by induction on n. For n = 1 this is just Theorem 9.52. Now 
suppose thatn > 1, and the Corollary has already been proven for all smaller values 
of n. Fix f. We parameterize Z, as x = (xX, Xn), where x € Zy and x, € Zp. If 
g(&, xn) is the Fourier transform of f(x, xn) in the x variable (with x, fixed), then 
f (E, &,) is the Fourier transform of g(&, x,,) in the x, variable (keeping x fixed). 

Let AC Z, be the set of all x, such that FC, Xn) (and hence g(-, x,)) is not 
identically zero. Observe that 1 < |A| < p and 


Isupp I = J |supp(fC, xn). 


XnEA 


9.8 Cyclotomic fields, and the uncertainty principle 367 


Thus by the pigeonhole principle there exists an x, such that 


|Al|supp(fC, xn) < [supp P). (9.22) 


Fix this x,. By induction we have 


p* \supp(f(-, xn) + p" *~'|supp(g, xD] =p" | +p” (9.23) 


for allO < k’ < n — 2. Also, for any & in the support of g(-, x,), we see that gg, +) 
is supported in A, so by Theorem 9.52 


lsupp(f(é,-))| => p +1- Al. 


Summing this over all £ in the support of g(-, xn) we obtain 


|supp(f)| = (p + 1 — |A))|supp(g(-, xn))l. 
Combining this with (9.22) we obtain 


p*|supp(f)| + p” *“!|supp(f)| = p*|AllsuppCf(, xn))! 
+(p+1—|Al)p"*"|supp(g(-, xn))I- 


When |A] is equal to 1 or p then the right-hand side here is at least p” + p”~! 


thanks to (9.23). Since the right-hand side is linear in |A|, the same is true for the 
intermediate cases 1 < |A| < p. This completes the induction. 














Exercises 


9.8.1 Let p be a prime and k > 1. Prove that ®,(z)=1+z+27+---+2? 1 
and ® p(z) = Dp”). 

9.8.2 (Eisenstein’s criterion) Let p be a prime, and let P(t) = ant” +---+ ao € 
Z{t] be such that a, is not divisible by p, that a,_1,..., do are divisible 
by p, and ap is not divisible by p*. Show that P is irreducible in Z[r]. 

9.8.3 Let p bea prime. Compute the polynomial ® „(t — 1) explicitly, and then 
use Eisenstein’s criterion to give a proof that ®,(t — 1), and hence ®, 
itself, is irreducible in Z[t], without using Lemma 9.48. 

9.8.4 Letn > | bean integer, and suppose that x € F% is such that ®, (x) = 0. 
Show that ord, (x) = n, and in particular n divides p — 1. 

9.8.5 Letn, m be integers. Using Exercise 9.8.4, show that all the prime factors 
of ®,,(m) are equal to 1 mod n and are coprime to m. Using this (and 
modifying Euclid’s proof of the infinitude of primes) show that there 
are infinitely many primes equal to 1 mod n; this is a special case of 
Dirichlet’s theorem. 

9.8.6 Let n> 1, and let w be a primitive nth root of unity. Show that the 
cyclotomic field Q(w) is a #(n)-dimensional vector space over Q, and 


368 


9.8.7 


9.8.8 


9.8.9 


9.8.10 


9.8.11 


9 Algebraic methods 


2 ..., @%-! form a linear basis for 


that the complex numbers 1, w, w 
Qw). 

Let p be a prime, and let w be a primitive pth root of unity. Let Z[@] be the 
ring generated by w. Show that the quotient ring Z[w]/((1 — œ) - Z[@]) 
is isomorphic to the field F,. (Hint: exploit the fact that @,(1) = p, and 
hence ®,(w) — p contains a factor of (1 — w).) 

[120] Let p be a prime, let w be a primitive pth root of unity, let z;,..., Zk 
be distinct pth roots of unity, and let nı, ..., ng be distinct integers in 
[0, p). Suppose there exists a polynomial P € Z[w][z] of degree at most 
p — 1 which vanishes at z1, ..., zg and has at most k non-zero coeffi- 
cients. Using Exercise 9.8.7 and Lemma 9.26, show that P is a multiple 
of (1 — w). Using this and an infinite descent argument, obtain another 
proof of Lemma 9.51 (at least in the case g = p, which is all one needs 
for Theorem 9.52). 

[355] Deduce Theorem 9.52 from Lemma 9.51. (Hint: Lemma 9.51 
implies that all the minors of the Fourier matrix (e7"/*/?))<; kzp are 
invertible.) Conversely, show that Theorem 9.52 implies the q = p case 
of Lemma 9.51. 

Let p be a prime, let G := {z € C : zP = 1} be the pth roots of unity, and 
let P € C[z] be anon-zero polynomial with deg(P) < p. Show that that 
the number of zeroes of P in G cannot exceed the number of non-zero 
coefficients in P. 

[249] Given any finite additive group Z and any real number k, let 6(Z; k) 
denote the quantity 


OCZ; k) == inf{|supp(f)| : f € L°(Z); f # 0; |supp(f)| < k}. 


Show that for every subgroup G of Z and any | < k < |Z|, we have the 
inequality 

0(Z;k) > sup 0(G; s)0(Z/G; tf) 

st=k 

by adapting the proof of Corollary 9.54. Conclude via an inductive 
argument that for any non-zero function f in L?(Z), the lattice point 
(|supp(f)|, [supp(f)]) lies on or above the convex hull of the points 
(|G|, |Z|/|G|) as G ranges over all subgroups of Z. 


10 





Szemerédi’s theorem for k = 3 


A surprisingly fruitful and deep problem in additive combinatorics is that of deter- 
mining whether a given set A contains non-trivial (i.e. proper) arithmetic progres- 
sions of a given length. We have already seen some special cases of this problem; 
in Section 4.7 we saw that sum sets such as A+ A, A+ A+ A, or 2A—2A 
contained very long arithmetic progressions (and generalized arithmetic progres- 
sions), while in Section 6.3 we saw that if we colored a large finite group (or a large 
interval of integers) into a small number of color classes, then one of the color 
classes must necessarily contain a long arithmetic progression. In this chapter and 
the next we shall discuss perhaps one of the deepest theorems known to additive 
combinatorics, namely Szemerédi’s theorem: 


Theorem 10.1 (Szemerédi’s theorem) /345] Let A be a subset of the positive 
integers with positive upper density! F(A) > 0. Then A contains arbitrarily long 
arithmetic progressions. 


This theorem was originally proved by Szemerédi in 1975 by a sophisticated 
combinatorial argument, introducing for the first time the powerful Szemerédi reg- 
ularity lemma, which we discuss in Section 10.6. There are several other deep and 
important proofs of this theorem, including the ergodic-theoretic proof of Fursten- 
berg [125], the additive combinatorial proof of Gowers [138], and the hypergraph 
regularity proofs of Gowers [140] and Nagle, Rödl, Schacht, and Skokan [254], 
[282], [283], [284]. These proofs will be discussed in the next chapter. 

One can formulate Szemerédi’s theorem in a more quantitative manner, using 
the following definition. 


Definition 10.2 (Erdés—Turan constant) [99] Let A be an additive set, and let 
k > 1. We let r,(A) denote the size of the largest subset of A which does not 
contain any proper arithmetic progressions of length k. 


1 Upper and lower density were defined in Definition 1.21. 


369 


370 10 Szemerédi’s theorem for k = 3 


Examples 10.3 We have rı(A) = 0 and r2(A) = 1 for any additive set A. Clearly 
r(A) is non-decreasing in A, and we have the trivial bound r(A) < |A| for any 
A. If A lives in a p-torsion group (e.g. A C F,) then r(A) = |A] for all k > n. 


Theorem 10.1 is then easily shown to be equivalent to the following version, 
which was first conjectured by Erdős and Turan [99]. 


Theorem 10.4 (Szemerédi’s theorem, second formulation) Let k> 1 and 
N > 1. Then r,((1, N]) = On-s00(N) and ry(Zy) = Ono: (N). 


One in fact has the following generalization: 


Theorem 10.5 (Szemerédi’s theorem, in an arbitrary group) Let k > 1 and 
let Z be a finite additive group. Then r:(Z) = 0jz\+00:x(|Z|). 


This generalization either follows from the density Hales—Jewett theorem [124] 
or from the hypergraph proofs of Szemerédi’s theorem [140], [254], [282], [283], 
[284], and will be discussed in Section 11.6. 

A further famous conjecture of Erdős and Turán remains open: 


Conjecture 10.6 (Erdős-Turán conjecture) [99] Let A C Z* be such that 
yee 1 = œ. Then A contains arbitrarily long proper arithmetic progressions. 


Up to very small factors, such as log?” N, this conjecture is essentially equiv- 
alent to asking for r;({1, N]) = O(N / log N) for all k and N (Exercise 10.0.6). 
This conjecture remains unsolved even for progressions of length 3 (though see 
Theorem 10.30 below). However a special case of this conjecture, restricted to the 
prime numbers P = {2,3,5,...}, has recently been proven by Green and Tao: 


Theorem 10.7 (Green—Tao theorem) [158] Letk > 1 and N > 1. Thenr,(P A 
[1, N]) = on-soox(|P A[1, N]I). In particular, the primes contain arbitrarily long 
arithmetic progressions. 


Note from (1.48) that the sum J- 5 2 is divergent. 

For general k, Szemerédi’s theorem and the Green—Tao theorem are rather 
involved and will be treated in Chapter 11. However, the k = 3 case is amenable 
to Fourier-analytic methods, and we have the following famous theorem of Roth: 


Theorem 10.8 (Roth’s theorem) /287] We have r3([1, N]), r3(Zn) = 0ON>œ(N) 
forall N > 1. More generally, for any finite additive group Z of odd order we have 
r3(Z) = Oz|-+00(|Z)). 


The generalization to arbitrary additive groups Z of odd order is due to Meshu- 
lam [248]. Note that the restriction that Z be odd is necessary, since for 2-torsion 
groups, there are no proper progressions of length three and hence r3(Z) = |Z| in 
that case. 


10 Szemerédi’s theorem for k = 3 371 


Both Roth’s theorem and Szemerédi’s theorem have a surprising diversity of 
different proofs, using such techniques as harmonic analysis, ergodic theory, graph 
theory, hypergraph theory, inverse sum set theory, and Ramsey theory. However, 
they all revolve around a fundamental dichotomy, namely the dichotomy between 
arithmetically structured sets (e.g. arithmetic progressions, Bohr sets, sets of small 
doubling, sets of large additive energy, almost periodic sets) and arithmetically 
unstructured sets (e.g. random sets, pseudo-random sets, “mixing” sets). The point 
is that one needs very different arguments to deal with either of the two cases, and so 
any proof of the above theorems must first decompose a general set somehow into 
a structured component and an unstructured one. To make such a decomposition 
rigorous, one needs some powerful tools, for instance from harmonic analysis, 
ergodic theory, or graph theory. 

The purpose of this chapter is to give several proofs of Roth’s theorem, both for 
general Z and in special cases, and to also discuss some variants of this theorem. 
These proofs serve as models for the more difficult Szemerédi and Green—Tao 
theorems, to be discussed in the next chapter. It turns out that linear Fourier 
analysis (as developed in Chapter 4) is a particularly well adapted tool to detect 
progressions of length 3; as we shall see however in the next chapter, progressions 
of longer length will require a quadratic or higher-order Fourier analysis. 


Exercises 


10.0.1 Establish the inequalities 
r(L1, N/k)) < (Zw) < (U1, ND) 


for any N > k > 1. This shows that the two forms r,(Zy) = On-+00:x(N) 
and rz([1, N]) = On-+00.x(N) of Theorem 10.4 are equivalent. 

10.0.2 Show that Theorem 10.4 is equivalent to Theorem 10.1. (Hint: to deduce 
Theorem 10.1 from Theorem 10.4 is rather easy. For the converse direc- 
tion, argue by contradiction, obtaining dense subsets of [1, N] without 
any proper arithmetic progressions, and paste those subsets together in 
some suitable way to contradict Theorem 10.1.) 

10.0.3 Show that Theorem 10.1 is equivalent to the statement that every subset 
of the integers of positive upper density contains infinitely many progres- 
sions of length k, foreach k > 1. 

10.0.4 Show that Szemerédi’s theorem implies van der Waerden’s theorem 
(Exercise 6.3.7). 

10.0.5 Give an example to show that if the positive integers Z* are partitioned 
into two color classes, then it is not necessarily the case that one of 
the color class contains an infinitely long proper arithmetic progression 


372 


10.0.6 


10.0.7 


10.0.8 


10.0.9 
10.0.10 


10.0.11 


10.0.12 


10 Szemerédi’s theorem for k = 3 


a + Z* . r. Thus the properties of containing arbitrarily long proper arith- 
metic progressions, and infinitely long proper arithmetic progressions, are 
distinct. 

Show that the Erd6s—Turan conjecture is equivalent to the absolute con- 
vergence of the sum 


5 re, 2a 

n=1 
Show that if A and B are additive sets which are Freiman isomorphic of 
order 2, then r(A) = r( B) for all k. 
If A and B are additive sets (possibly in different groups), show that 
r(A x B) > ry (Ary (B). 
If Z, Z’ are two finite additive groups, show that r(Z x Z’) < ry(Z)|Z'|. 
Show that to prove Theorem 10.5 for arbitrary groups Z, it suffices to 
verify it for cyclic groups Zy and for vector spaces Z7, over fields of prime 
order. (Hint: use Corollary 3.8 and the previous exercise.) A similar claim 
applies of course to Roth’s theorem. 
Let n > 1. Define a capset of order n to be any subset of the vector space 
F% over the finite field F} which contains no (affine) lines. Show that 
the largest possible cardinality of a capset of order n is r3(F3). Using 
Exercise 10.0.8, show that r3(F;)) > 2”. 
If Z is a finite additive group whose order is coprime to k!, show 
that r(Z) < A — DIZ |. (Hint: if A C Z has cardinality greater than 
(q -— ĐIZI, choose a € Z, r € Z\{0} randomly and consider the proba- 
bility of the events a + jr ¢ A for j = 0, 1,...,k— 1.) 


10.1 General strategy 


In this section we make some general observations concerning progressions of 
length 3, and describe in high-level terms the various strategies one could employ 
to prove Roth-like theorems. 

Let us work in a fixed finite additive group Z of odd order, and let A be a subset 
of Z. We shall think of A as being rather dense, so that the density 0 < Pz(A) < 1 
is moderately large. Roth’s theorem is then an assertion that if |Z| is sufficiently 
large, then A must contain progressions of length three. 

To explain why this should be the case, it is convenient to introduce the trilinear 


form 


ACF, 8, h) = Ex rez f(g + r)h + 2r) (10.1) 


10.1 General strategy 373 


for any f, g,h : Z — C. Note in particular that 
A3(1a, 1a, 14) = Px rez(x, x +r, x + 2r € A) (10.2) 


so the quantity A3(1,4, 14, 14) measures the proportion of arithmetic progressions 
(x,x +r,x + 2r) in Z which are completely contained in A. Intuitively, if A is 
“randomly” distributed, then the events x € A, x +r € A,x +2r € A should be 
“independent”, and we then expect 


A3(la, la, 14) X w rez(x € A)Py rez(x ae A)Py rez(* + 2r € A) 
= PZ(A)’. (10.3) 


Thus if A is fairly dense in Z, we expect A3(14, 14, 14) to be large. On the other 
hand, if |Z| is odd and A has no proper progressions of length 3, then the only 
progressions (x, x + r, x + 2r) which can lie in A are those for which x € A and 
r = 0, whence 


A3(1a, la, 14) = Pz(A)/IZI. (10.4) 


If |Z| is sufficiently large, this seems to be in conflict with the heuristic (10.3). 
Thus to prove Roth’s theorem it will suffice to establish some rigorous analog of 
(10.3). In particular, Roth’s theorem will be implied by the following result. 


Theorem 10.9 (Varnavides’ theorem) [372] Let Z be a finite additive group of 
odd order. Then for any non-empty set A C Z we have 


A3(14, la, 14) = Qp,(ay(). 


In other words, we have A3(14, 14, 14) > c(Pz(A)) where c(Pz(A)) > 0 depends 
only on the density Pz(A) of A and not on the group Z. More generally, if f : 
Z — R* is a non-negative function which is not identically zero, and obeying the 
bound 0 < f(x) < | forall x € Z, then 


A3(f, f: f) = Qe, fy). 


Note that Varnavides’ theorem is in fact a bit stronger than Roth’s theorem, as 
it implies that any subset of Z of density 5 will contain Q;(|Z|?) proper arithmetic 
progressions of length 3, if Z is sufficiently large depending on ô. This is in 
contrast with Roth’s theorem which would only provide a single proper arithmetic 
progression of length 3. Nevertheless, a simple averaging argument shows that the 
two theorems are equivalent: see exercises. 

It is still not clear how to convert the heuristic (10.3) into a rigorous state- 
ment such as Theorem 10.9. Indeed (10.3) can fail for certain special A, with 
A3(1a, la, 14) ranging as high as P,(Ay’ if Aisa subgroup of Z, and as low as 


P7(A) E rz) if A is given by the Behrend example (see exercises). However, it 





374 10 Szemerédi’s theorem for k = 3 


turns out that A3(14, 14, 14) will be very close to P7(A)? (as predicted by (10.3)) 
as long as A has very little linear bias. Recall from Definition 4.12 that the linear 
bias (or Fourier bias) || A||,, of an additive set A was defined as 


|All. := sup [14(€)| = sup |Exvezla(x)e(—€ - x)|. 
EEZ\0 EEZ\0 


Proposition 10.10 (Lack of progressions implies non-uniformity) [287] Let 
A be an additive set in a finite additive group Z of odd order. Then 


|A3(1a, La, 14) — Pz(A)?| < |JAl|uPz(A). 


In particular, if A contains no proper arithmetic progressions of length 3, then we 
have the linear bias estimate 
|All, > P(A — =. 
|Z] 
Proof From the identity a — 2(a + r) + (a + 2r) = 0, and the observation that 
the map x +> 2- x is bijective on Z when |Z] is odd, we see that 
1 


A3(a, la, la) = [ze har 42 a) € Ax (-2-A)x A: 0 =a; +a +43}|. 
Applying Lemma 4.13 we obtain the first inequality. The second claim then follows 


from (10.4). 














This shows that the only way the heuristic (10.3) can fail is if the function 14 has 
a large correlation with a linear character e(& - x). This very important observation 
can be viewed as an inverse theorem for A3; we will return to this perspective in 
the next chapter. There is an analog of the above proposition for functions. Define 
the linear bias || f ||,2(z) of a function f : Z — C to be the quantity 


II flluzcz) = sup | f (I. (10.5) 
EEZ 


The reason for the notation u2(Z) will be made clearer in the next chapter. Note 
for instance that || A||,, = ||14 — Pz(A)|l,2(z) for any A C Z. 


Proposition 10.11 Let Z have odd order. For any functions f, g,h: Z —> C, we 
have the identity 


A, g, h) = Y PERCE). (10.6) 


EEZ 


We can then conclude the estimate 


lA3(f, g, WI < Ifl lgliz lhl 


and similarly with f, g, h permuted on the right-hand side. 


10.1 General strategy 375 


Proof From the Fourier inversion formula (4.4) we have 
f=) fede; g=) lade; h=) h&yec, 
é & & 


and hence 


Afg h= YS  fEDBEE)ACE, es, es). 


§1,62,8€Z 


On the other hand, a direct computation using Lemma 4.5 shows 
Ales, €&» eg) = WE. = — 281; &3 = £1) 


which gives (10.6). From Parseval’s identity (4.3) and the hypothesis that Z has 
odd order, we have 


Yo 18-28)? = lelia YS AOP = lali 


EEZ EEZ 


and the claim then follows from Hölder’s inequality. Similarly if the roles of f, g, h 
are permuted. 














To exploit inverse results such as Proposition 10.10 or Proposition 10.11, there 
are two arguments available: the density increment argument of Roth, and the 
energy increment argument developed separately by Furstenberg and Szemerédi 
(in very different contexts). The density increment argument proceeds informally 
as follows. To prove Roth’s theorem, suppose for contradiction that one can find a 
dense set A in a large group Z (or interval [1, N]) which contains no progressions 
of length three. Proposition 10.10 then implies that A has large linear bias, thus 
1, correlates with some linear phase function e(é - x). It then turns out that this 
linear bias can be converted into a density increment, or more precisely some 
structured subset Z’ (such as a subgroup, a sub-progression, or a Bohr set) of the 
original space Z on which A has larger density, thus Pz (A) > Pz(A). (Recall that 
Pz(A) = |A A Z'|/|Z'| and Pz(A) = |A|/|Z|.) One then passes to this structured 
subset and repeats the argument. If the original space Z was large enough, we 
can run this argument for so many steps that the relative density of A eventually 
exceeds 1, a contradiction. 

The energy increment argument proceeds differently, aiming to prove 
Varnavides’ theorem instead of Roth’s theorem (i.e. one seeks non-trivial lower 
bounds on A3(f, f, f)). Instead of continually changing the ambient space Z, we 
now hold Z fixed, but instead construct certain low complexity approximations 
fu- to the original function f. Initially, our approximation will just be the density, 
fu+ = Pz(A). We now consider the error fy := f — fy. between the indicator 
function and the approximation. If this error is very linearly uniform (in the sense 


376 10 Szemerédi’s theorem for k = 3 


that the Fourier bias || fy||,2¢z) is small), then Proposition 10.11 can be used to 
approximate A3(f, f, f) by A3(fu:, fy, fy+), and one can exploit the low com- 
plexity of fy- to obtain a non-trivial lower bound on the latter quantity. If instead 
the error exhibits linear bias, one can exploit this by refining the approximation 
fu- to absorb this bias; this will increase the energy || fy- eon of fy by a signif- 
icant amount. One then repeats the argument until the error fy contains no further 
bias; a key point will be that that f (and hence fy) remain bounded throughout 
the iteration and so the energy of fy. cannot increase indefinitely. 


Exercises 


10.1.1 Let Z bea finite additive group of odd order, let0 < ô < 1, and let A bea 
random subset of Z such that the events x € A are independent with prob- 
ability P(x € A) = ô. Show that with probability 1 — 0)z\00,s(1), we 
have Pz(A) = ô + ojz)-+00:8(1) and A3(14, 14, 14) = 8° + 017)-400:3(1), 
thus confirming (10.3) in the random case. (Hint: use Corollary 1.9.) 

10.1.2 Let Z bea finite additive group of odd order. Show that A3(14, 14, 14) < 
P;(A)’, with equality attained if and only if A is the translate of a subgroup 
of Z. 

10.1.3 Let N,d,r > 1 be integers, and consider the set 


A= {(m,...,ma) € [0, N/D} : nf + +n =r}, 


viewed as a subset of Zh. Show that this set has no proper arith- 
metic progressions of length 3, and can have cardinality as large as 
(N [2 /(d°N 2) for a suitable choice of r. Conclude in particular that 
r3(Z4,) > N4/(24d?N?). 

10.1.4 (Behrend’s example) [21] Using the preceding exercise and a Freiman 
isomorphism, establish the bounds 


r3(Zy), r3((1, NI) = Q(NeW OVEN) 


for all large N. In particular, it is not the case that r3({1, N]), 73(Zy) = 
O(N!~*) for any fixed ¢ > 0. This rules out a number of elementary 
approaches to proving Roth’s theorem or Szemerédi’s theorem (e.g argu- 
ments based entirely on Cauchy—Schwarz and pigeonhole principle type 
arguments) as these tend to only give polynomial type bounds. We remark 
that the more general estimate 


re(Zy), Pe(L1, NI) = (N exp (— Ox(log Ny/CTHBE—Y))) 


for all k > 3 has been established in [277], [221] by a similar argument. 


10.1.5 


10.1.6 


10.1.7 


10.1.8 


10.1.9 


10.1.10 


10.1.11 


10.1.12 


10.1 General strategy 377 


Given any 0 < ô < 1, give an example of an additive set A in a cyclic 
group Zy such that Pz(A) > 6 but 


Ala, la, 14) = O (52025), 


(Hint: use the Behrend example.) Thus it is not possible to establish any 
lower bound of the form A(14, 14, 14) = Q(Pz(A)°) for any absolute 
constant C > 0. 

[253] Let N be a large number. Show that one can color Zy into 
exp(O(./log N)) color classes, such that none of the color classes con- 
tains aproper arithmetic progression of length three. Hint: modify the 
Behrend example. 

Show that Varnavides’ theorem for sets A implies Varnavides’ theo- 
rem for functions f. (Hint: either bound f from below by a constant 
multiple of an indicator function, or construct a set A probabilisti- 
cally using f(x) as the probability that x € A and use the first moment 
method.) 

Show that the special case r3([1, N]) = on-+.0.(N) of Roth’s theorem 
implies Varnavides’ theorem for Zy. (Hint: take a set A in Zy and intersect 
it with a randomly chosen progressiona + [1, M] - r for some moderately 
large M, and apply Roth’s theorem to the progressiona + [1, M]-r.Then 
use the first moment method.) 

Let F be a finite field. Show that the special case r3( F”) = On-+00:F (N) of 
Roth’s theorem implies Varnavides’ theorem for F”. (Hint: take a set A in 
F” and intersect it with a randomly chosen m-dimensional affine subspace 
of F” for some moderately large m. Then argue as in the preceding 
exercise.) 

Show that Roth’s theorem for arbitrary Z implies Varnavides’ theorem 
for arbitrary Z. 

Use Proposition 10.11 and the decomposition 14 = (14 — Pz(A))+ 
Pz(A) to provide an alternative proof of Proposition 10.10. 

Assume Theorem 10.9. Let (X,8,d) be any probability space (so 
U(X) = 1), and let T : X — X be any measure-preserving bijection 
on X, so w(T"(E)) = u(E) for all E € B and n € Z. Show that if 
f : X — Rt is any function with 0 < f(x) < 1 almost everywhere and 
Jy f = 8 > 0, then 


lim inf Ent. T FOT" FWT” f(x) dua) = Y). 
>00 X 


378 10 Szemerédi’s theorem for k = 3 


10.2 The small torsion case 


We now use the above Fourier-analytic methods and the density increment argu- 
ment to prove the following simple special case of Roth’s theorem. 


Proposition 10.12 (Roth’s theorem for p-torsion groups) [248] Let Z be a p- 
torsion group (thus px = 0 forall x € Z) for some odd prime p. Then 


r3(Z) < = |Z|. 
log, |Z| 
Remark 10.13 Define a capset to be a subset of the vector space Z3 which contains 
no lines. Then the above proposition implies that capsets have density less than 
3/n. Rather amazingly, this simple bound is essentially the best known (other than 
improving the constant 3); in the converse direction, the best lower bound known on 
the density of capsets in Z3 is (0.724581... + 0(1))"; see [75]. Any improvement 
of the upper bound to o(1/n), or the lower bound to (1 — 0(1))", would be a 
significant advance in our understanding of the Erdés—Turan conjecture. 


Remark 10.14 A useful heuristic is that the cyclic group Zy (or the interval 
[1, N]) should behave roughly like the p-torsion group Zi, whenever N ~ p”. 
Using this heuristic and the above proposition, one would expect that r3([1, N]) and 
r3(Zy) should be O(N / log N). Such a bound would essentially be equivalent to 
the Erdés—Turan conjecture (Conjecture 10.6) in the k = 3 case. Unfortunately the 
direct analog of the above argument gives r3([1, N]), 73(Zy) = O(N eee s 
see Theorem 10.30. In general, the p-torsion groups are somewhat easier to analyze 
than general groups, due to their vector space structure over the field Fp. To 
extend the p-torsion arguments to more general settings, one needs some additional 


machinery, in particular the theory of Bohr sets. 


We now begin the proof of Proposition 10.12. We may view Z as a vector space 
over F,. Assume for contradiction that we can find a set A C Z of density Pz (A) > 
AA which has no proper progressions of length 3. From Corollary 10.10 we 
already know that A must exhibit linear bias, thus || A||,, is large. To use this fact, 
we need to convert linear bias to a more useful structural property. This is achieved 


as follows. 


Lemma 10.15 (Non-uniformity implies density increment) Let Z be a vector 
space over a finite field F, of prime order, and let f : Z — R be a function with 
mean zero, Ez(f)= 0. Then there exists a subspace Z' of Z of codimension 1 
over F,, and a point xo € Z, such that 


1 
Eyextz' f(x) = zl Fhe: 


10.2 The small torsion case 379 


Proof Without loss of generality we may take Z = F,, and use the bilinear form 
in Example 4.2. 

By definition of || f ||u2(z) and the mean zero hypothesis, we can find a non-zero 
€ € Z anda phase 0 € R/Z such that 


Re Eyez f(ye(S-y + 0) = Il fll» 


where e is the exponential map defined by equation (4.1). Applying the mean zero 
hypothesis again, we conclude 


Re Eyez fOe -y +O) + D = If luz) 


Let Z’ := {€}+ = {x € Z : £ - x = 0} be the orthogonal complement of £; then 
Z' is a subspace of Z of codimension 1, and the function yh e(&-y+0)+1 
is constant on every coset of Z’. Making the change of variables y = x9 + x for 
each x € Z’, and then averaging over x, we conclude 


Re Eyez fOe - y + 8) + 1) = Eyez Eyez f (xo + x)Relel -x +6) + I) 
= Enez (Exexytz f(x))Re(e(é -xo + 8) + 1). 


By the pigeonhole principle there must therefore exist a coset xo + Z’ such that 


(Exex +z’ f (x))Re(e(& -xo + 8) + 1) = If llu- 





Since Re(e(é - xo + 0) + 1) < 2, the claim follows. 











Remark 10.16 The reason to add 1 to e(&- y+) is to make sure that 
Re(e(é - y + 6) + 1) is non-negative. We will use this trick repeatedly in this 
chapter. 


We can now prove Proposition 10.12, by using the density increment argument 
of Roth. 


ProofofTheorem 10.12 By Corollary 3.8 we may take Z = F”, with the standard 
bilinear form in Example 4.2. We induce on n. The claim is trivial when n < 3, so 
suppose n > 3. Suppose for contradiction that r3(F,) > 3/n, then we can find a set 
A C Z with density Pz(A) > 3/n containing no proper progressions of length 3. 
Then by Lemma 10.15 (applied to f := 14 — Pz(A)) we have a coset xo + Z’ of 
Z of codimension one such that 


1 
Pro+z'(A) 2 Pz(A) + SIlAllu- 


380 10 Szemerédi’s theorem for k = 3 


Applying Corollary 10.10 we conclude 





P ae } 
w+ Ca 2n 2z 
3 4 
Tn ne 
3 
> 
~n—-l 


since |Z| = p” > n? andn > 3. By the induction hypothesis, the set (A — xo) N Z’ 
thus contains a proper arithmetic progression of length 3, and hence A does also, 
which gives the desired contradiction. 














A very similar argument also establishes Varnavides’ theorem in this setting: 


Proposition 10.17 (Varnavides’s theorem for p-torsion groups) Let Z bea p- 
torsion group for some odd prime p, and let f : Z —> R* be such that0 < f(x) < 
1 for all x € Z. Then 


MEE Pep. 


Proof Weinduceonn := |3/Ez(f)|. Whenn < 3 the claim is trivial, so suppose 
n > 3 and the claim has already been proven for n — 1. We may again view Z as a 
vector space over F,, with a standard bilinear form. Write f = fy. + fy, where 
fu := Ez(f) and fy := f — fy. Observe that 


A3(fus, fur, fut) = Ez(fy. 
If we had 
Alf, F, P) = Ez(fy'/9 


(say) then we would be done (since Ez(f)°/9 > pE), so let us assume 
instead that 


IAs, F P) — Alur, fur, fu) 2 8Ez(f)°/9. 
We can rewrite the left-hand side as the telescoping sum of three terms, 


lA3(fu. f, f) + A3(fut, fu. f) + As(fur, fur, fudl- 


From their definitions, we see that fy has mean zero, and fy- is constant. Thus 
one can easily verify that the latter two terms vanish. Hence 


lAs(fu. f, P) = 8Ez(f)°/9. 


Since f is bounded by 1, we have 


If laz = Ez(f?) < Ez(f) 


10.2 The small torsion case 381 


and hence by Proposition 10.11 we have 


Il fu lluacz) = 4Ez(f)?/9. 


Applying Lemma 10.15, we can find a subspace Z’ of Z of codimension 1, such 
that 


Exemtz' f(x) > Ez(f) + 4Ez(f)"/9. 


If we let g : Z’ — R be the function g(x) := f(x + xo), then g ranges between 0 
and 1 and we have 


Ez(g) > Ez(f) + 4Ez(f)"/9; 


this in particular forces Ez(f) < 3/4, and then from elementary algebra one 
concludes 


6 “5 6 
E7(g) ~ Ez(f) 
By the induction hypothesis we then have 





Aa(g, 8, 8) = pp EP), 


while from definition of g and positivity of f we have A3(f, f, f) = 
p~?A3(g, g, g). This completes the induction. 














A remarkable phenomenon is that lower bounds of the above type still persist 
when the boundedness condition f < 1 is replaced by a more general condition 
f < v, providing that the enveloping weight v is sufficiently pseudo-random. This 
phenomenon (essentially first observed in [212], [147]) was made more explicit in 
[158], when a transference principle was formulated. This principle was aimed at 
studying progressions of arbitrary length k and was phrased in an ergodic theory 
language, but a parallel Fourier-analytic principle in k = 3 exists, and was devel- 
oped in [159]. We give a simplified formulation of this result below, in the special 
contexts of random subsets of p-torsion groups. Specifically, we shall prove 


Theorem 10.18 (Roth’s theorem in random subsets of torsion groups) Let Z 
be a finite p-torsion group for some odd prime p, let |Z|~°°! < t < 1, and let 
B be a random subset of Z with the events x € B being independent with prob- 
ability P(x € B) = t. Then with probability 1 — 0)z|-+00;p(1) we have r3(B) = 
O|Z|->00:p (|B). 


Remark 10.19 The point of this theorem is that it allows us to detect arithmetic 
progressions in subsets of Z of density as low as |Z|~°°!, which is well beyond 
the reach of Proposition 10.12, provided that those sets have large relative den- 
sity compared to a random set. A modification of the proof given below can be 


382 10 Szemerédi’s theorem for k = 3 


used to establish that any subset of the primes of positive relative density contains 
infinitely many arithmetic progressions of length 3; see [147], [159]; the point was 
that the primes were contained in a set of “almost primes” which was very uniform 
(or “pseudo-random’’) and thus behaved very much like a random set in a certain 
Fourier-analytic sense. By replacing the Fourier-analytic methods with ergodic 
theory methods (and replacing linear uniformity with the notion of Gowers uni- 
formity, which could be obtained for the almost primes by some number-theoretic 
arguments of Goldston and Yildirim), this result was then extended to cover arith- 
metic progressions of arbitrary length; see [158]. Note that the original proof in 
[212] relied on the Szemerédi regularity lemma (Lemma 10.42 below) instead of 
Fourier-analytic methods (and has weaker bounds as a consequence); on the other 
hand, it works for an arbitrary finite additive group Z of odd order, and allows the 
density t to approach |Z|~!/?, which is the optimal value (Exercise 10.2.3). 


We now begin the proof of Theorem 10.18. We shall need the following exten- 
sion of Proposition 10.17, in which f is not bounded by 1, but is instead bounded 
by a “pseudo-random measure”, and also enjoys some Fourier bounds. 


Theorem 10.20 [159] Let Z be a finite p-torsion group for some odd prime p, 
and let f : Z — Rso be a non-negative function such that 


IÊ <M (10.7) 


for some2 < q <3and0 < M < œ. Suppose also that we have the bound f < v 
where v : Z —> Rso obeys the pseudo-randomness condition 


DE- WE =0 <n (10.8) 
for some 0 < n < 1. Then we have 


1 
A3(f, f; f) > 8p71Ez0) ae 7M? log, °/4 7 


Note that Proposition 10.17 corresponds to the case v = 1, in which case we can 
take n = 0 (and q, M are irrelevant). More generally, this theorem is useful when 
n is very small compared to 6 and M. The constants can be improved somewhat 
but this will not concern us here. 


Proof We may assume Z is a vector space over F’,, with a bilinear form as 
in Example 4.2. Let a := M/ log)/4 = We recall the spectrum Spec,(f) € Z, 
defined as 


Spec, (f) = {£ € Z : |fE)| > a}. 
From the hypothesis (10.7) and Chebyshev’s inequality we have 


1 
|Spec,(f)| < M1 /a1 = log, -. (10.9) 
n 


10.2 The small torsion case 383 


Thus if we let V = Spec, (f)+ be the orthogonal complement to Spec, (f), then 
V is a subspace of Z and! 


[VH < pel < L, (10.10) 


3 


We split f = fy + fue, aes fu :=f—f* man is the “uniform” compo- 


nent of f and fy. := f * Pav) ty is the “anti-uniform” component. This allows us 
to split A3(f, f, f) into eight terms, 


A3(f, f, f) = Alfu, fu, fu) t: + Alur, fut, fu) + As(fut, fut, fu+) 


The idea is to use Proposition 10.17 to obtain lower bounds on the last term, and 
(10.6) to obtain magnitude bounds on the remaining seven terms. 

We begin by controlling fy.. Since f is bounded pointwise by v, we can use 
the Poisson summation formula (Exercise 4.1.7) and (10.10), (10.8) to obtain 


V(x) = ve a) 
Peer * pm 


= D> EE - x) 


éevt 


< 14+ |V*| sup Jô) 
EeV+\0 


1 
<14+-n=2. 
n 


We thus see that fy. is bounded above by 2. Also it is non-negative and Ez (fy) = 
Ez(f) thanks to (4.10). Thus by Proposition (10.17) (applied to fy- /2) we have 


As(fyr, fur, fyr) > 8p PO. 


Now we consider the other terms. From the Poisson summation formula again we 
have 


A 


fur = flys and fy = fA — 1v2). 


In particular we have 


Ilf ula, llf vulla < M. 


Furthermore, since V+ contains Spec,( f), we see that 


sup | fy(&)| < a. 
EEZ 


1 This is extremely crude. It is likely that one can use the machinery of dissociated sets as in 
Lemma 4.36 to do better here. 


384 10 Szemerédi’s theorem for k = 3 


Applying (10.6) and Holder’s inequality we obtain 
1 
|A3(fu, fuz, fus)| < M4034 = MP logy? z 


and similarly for the other six A3() expressions to be estimated. The claim follows. 














Remark 10.21 The strategy of the above transference argument was to identify a 
fairly coarse partition of Z (in this case, into cosets of V ) to average against in order 
to produce a well-behaved approximant fy. to f, with the error fy between f and 
fu- being so uniform (in the Fourier sense) as to be negligible. This philosophy 
was developed in a quantitative manner in [150], in which an arithmetic version 
of the Szemerédi regularity lemma was obtained. 


The hypothesis (10.7) in this Corollary may seem to be restrictive, but in many 
cases one can control the /4 norm of f, or at least the spectrum Spec, (f) of f, by 
exploiting the pseudo-randomness properties of v. For instance, one has 


Lemma 10.22 (Tomas-Stein argument) Let Z be a finite additive group, and 
letv : Z > Rt and f : Z —> C be such that (10.8) holds for some n, and such that 
| f(x)| < vx) forallx € Z. Foranya > OletSpec,(f) := {&§ E Z: KOJ > a}. 
Then we have 


|Spec,(f)| < 4/07 
for alla > 2n'/. 


Remark 10.23 This estimate should be compared with (4.37); the point is that 
no L? bound on f is assumed, otherwise this type of estimate would follow from 
Plancherel’s theorem. The orthogonality argument used here plays a fundamental 
role in the restriction theory of the Fourier transform, see for instance [356] for 
a survey. It is also closely related to the large sieve inequality in analytic number 
theory. 


Proof For eaché € Spec, (f) let c(Ẹ) := sen(f (é)). Then we have 





> jo- X IRO = alSpecg( f). 
E 


EESpec, (f) ESpec, (f) 


But the left-hand side can be rewritten as 


Ez ( > a) ; 


EESpec, (f) 


10.2 The small torsion case 385 


Since f < v, we may use Cauchy—Schwarz and conclude that 


2\ 1/2 
X Ee 


&EESpec, (f) 


a|Spec,(f)| < Ez(v)'?Ez | v 








Since E7(v) = 6(0) < 1+ 7 < 2, we thus conclude that 


Yd, cee 


SeSpecy(f) 


2 


1 2 z 
Ez |v > 507 |Specy( fl. 








We can expand the left-hand side as 
Yo c&)\cEMEzweery= YS) EEDE- E’). 
§,’eSpec,(f) §,§’eSpec,(f) 
But since |c(€)| = 1 and |f(é — &’)| < n + I(E — £’ = 0), we conclude that 


1 
zo lSpec ADP Dd) n+IG-# =0) 
5,8’€Spec,(f) 


< nlSpec,(f)|? + |Spec,(f)I. 


Since a > 2n'/*, we have n|Spec,(f |? < 4a°|Spec,(f)I|?, and the claim follows. 














We can now prove Theorem 10.18. 


Proof of Theorem 10.18 We may assume that |Z| is sufficiently large depending 
on 6, p since the claim is vacuous otherwise. We shall abbreviate 0)z)-,00;p(1) 
simply as o(1). From Corollary 1.9 we have Pz(B) = t + oqz!) (say) with 
probability 1 — o(1); in particular B is non-empty. Also, if we set v := 1g/t, then 
by Lemma 4.16 (with A replaced by Z) we have 


sup |0(€)| = O(|Z|-') 
EEZ\0 


again with probability 1 — o(1). Combining this with our density bound on Pz(B), 
we thus have 


up DE) -— Ié = 0)| = O(|Z|'°) (10.11) 


with probability 1 — o(1). Henceforth we shall condition on these events. 

Let ô = 6(|Z|, p) < 1 be a small quantity decaying to zero very slowly as 
|Z| > œ (i.e. ô = o(1)); it will suffice to show that for 6 sufficiently slowly 
decaying, and conditioning on the previous events, every subset A of B with relative 
density |A|/|B| > ê will contain a proper arithmetic progression of length 3. 


386 10 Szemerédi’s theorem for k = 3 


Set f := 1,4/t. Clearly f is non-negative and f < v. AlsoEz(f) > 6Pz(B) = 
©(6t). From Lemma 10.22 and (10.11) we have 


|Specy(f)| < 4/a? whenever œ = 0(|Z|~'/"°), 


while from (4.2) we have the very crude bound IF liz) <t? < |Z|”. Com- 
bining these two estimates, we easily obtain 


II f lliscz) = 00) (10.12) 


(for instance); see Exercise 10.2.2. Applying Theorem 10.20 (with n := |Z|~!/>), 
we conclude 


A3(f, ts f) = 8p 7/5 = O(log” IZI). 


On the other hand, if A contained no arithmetic progressions of length 3, then we 
would have 





A 1 
Ah f= os = 0 (sz) = 002, 


which would lead to a contradiction if Z was large compared with ô, p, and the 
claim follows. 














We remark that the above argument is quite quantitative, and it is not difficult 
to use it to extract specific bounds for Theorem 10.18, but we will not do so here. 


Exercises 


10.2.1 Show that if A, B are two additive sets (possibly in different ambi- 
ent groups) then r(A x B) > r;,(A)r;,(B). Conclude in particular that 
r3(Fy) > 2” for all n. 

10.2.2 Deduce (10.12) from the bounds on Spec, (f) and IÊ. (Hint: one 
can use an analog of (1.7).) 

10.2.3. Show that Theorem 10.18 fails if t = |Z|~!/~® for any absolute constant 
£. (Hint: count the number of proper progressions of length 3 in B, and 
remove them to create A.) 

10.2.4 [248] Let s(n, d) be the quantity defined in Section 9.6. Show that 
s(3, d) = O(r3(F3)). In particular, we have s(3,d)= O(34/d) for 
large d. 


10.3 The integer case 


We now sketch the proof of Roth’s theorem for integers (which was the original 
setting for Roth’s argument). We shall be somewhat brief here as the result will be 
superseded by the Roth—Bourgain theorem, Theorem 10.30. 


10.3 The integer case 387 


As in the proof of Theorem 10.12, we need two ingredients; first, we need 
to show that lack of progressions in [1, N] implies some linear bias, and second 
we need to convert this linear bias to a density increment on a sub-progression 
of [1, N]. Because [1, N] is not quite a group, we cannot apply Corollary 10.10 
directly. However we have the following substitute. 


Proposition 10.24 (Lack of progressions implies non-uniformity) [287] Let 
P be an arithmetic progression of integers, and let A C P be such that |A| = ô| P| 
for some 0 < ô < 1. Assume also that | P| > 100/5*, and that A contains no arith- 
metic progressions of length 3. Then there exists £ € R/Z such that 


[Encp(14(n) — d)e(né)| = Q8’). 


Proof By arescaling argument one can take P = [1, N]. By Bertrand’s postulate 
(Exercise 1.10.3) we can find a prime p between 2N and 4N. We identify A with 
a subset of Z, in the usual manner (and give Z, the standard bilinear form), and 
observe from (10.2) and the hypothesis on A that 


1 ô 
A3(l4, 14, 1a) = —|A| < —. 
3(la, la, 1a) Pe lon 


Let us now split 14 = fu + fy+, where fy := d1p,y) and fy := l4 — fut. A 
simple computation shows that 


83 
A3( fu, fut, fu) = 100 


(say). By hypothesis on N, we conclude 


|A3(fu + fur, fu + fur, fu + fu) — As(fus, fur, fus)| = 26°). 


The left-hand side can be split as the sum of seven terms, so at least one of them 
is Q(5+). For sake of discussion let us suppose that 


|A3(fu. fu, ful = 206°); 


the other six cases are similar (the point being that all of them involve at least one 
copy of fy). Using (10.6) and the triangle inequality, we conclude that 


D fu @ Pi fu(—28)| = 26°). 


EeZ, 


On the other hand, from Plancherel’s theorem we have 


Yo fu @P = I folliz) = O(Wallizg,) + llnl?) = 0). 
E€Zy 


We thus conclude that there exists € € Z, such that 


| fu(—2€)| = Q(8°), 


388 10 Szemerédi’s theorem for k = 3 


thus 
[Eneti,v\(La(m) — d)e(2né/p)| = Q8’). 











The claim follows. 





Similarly, we have the following analog of Lemma 10.15. 


Lemma 10.25 (Non-uniformity implies density increment) [287] Let f: 
Z— R be a function supported on an arithmetic progression P such that 


| f(n)| < 1 for alln, $, f(n) = 0, and 
[Ener f(nje(né)| = o 


for some € € R/Z and o > 0. Then there exists a proper arithmetic progression 
P’ C P with |P'| = Q(o?| P|!) and. 


[Ener f(n)| > 0/4 
Proof Again we may take P = [1, N]. Using the Kronecker approximation the- 
orem (Corollary 3.25) we can find an integer 1 < r < N!/? such that lré\Irn/z < 
N~!/2 Let Po denote the progression [1, o N!//100] - r. Then we have 


=oN, 














D Eren fn + engeni) = |S feng) 


where e is defined in equation (4.1). On the other hand x € Po, we see from (4.24) 
that |e(x€) — 1| < o/10, and so 


< So o/l0<oN/2 


ne[—N,N] 





Yo Een f(n + xené (e(x) — 1) 





(say), and so by the triangle inequality 


> o N/2. 





Yo Eren f(n + xe(né) 





In particular there exists a phase 6 € R/Z such that 


Re 5 Eyer, f(n + xené +0) > o N/2. 


Since f sums to zero, we have }_„ Eyer f(n + x) = 0, and hence 


Yo Eren f(n + x)Re(1 + e(n& + 0)) > o N/2. 


Note that the sum is only non-zero when n € (—N, N]. By the pigeonhole princi- 
ple, there thus exists an n such that 


Event ef) = Exen f(n + x) > o N/4. 


10.4 Quantitative bounds 389 


Since f is bounded and supported in [1, N], we conclude in particular that 


In + Po) A [1, N]| = | Pol/4 = Q(07N"”). 





The claim then follows by taking P = (n + Po) A [1, N]. 











Combining this with the preceding proposition, we conclude 


Corollary 10.26 (Lack of progressions implies density increment) Let A C P 
be such that |A| = 5|P| for some 0 < 6 < 1. Assume that |P| > 100/57. Suppose 
also that A contains no arithmetic progressions of length 3. Then there exists a 
proper arithmetic progression P’ in P with | P'| = Q(5*|P|'/*) such that we have 
the density increment 


Pp(A) > Pp(A) + Q(5°). 


By iterating this Corollary, one can eventually show that r3([1, N]) = 
Olgan) we leave this as an exercise to the reader. 

There has been some recent progress in understanding the structure of subsets 
of Z/NZ which attain the minimal number of progressions of length 3 among all 
sets with a given density; see [65]. It may be that this will lead to an alternative 
proof of Roth’s theorem. 


Exercises 


10.3.1 [287] By iterating Corollary 10.26, establish the bound r3(P) = 
Ocha) for any arithmetic progression P of integers of length N, 
and hence r3(Zy) = OWN 

10.3.2 [372] Let f : Zy — R* be such that 0 < f(x) < 1 for all x € Zy. By 
using the previous exercise and arguing as in Proposition 10.17, show 
that 


Alf, f, f) = Q(exp(— exp(O(1/Ez(f))))). 


10.4 Quantitative bounds 


In the preceding section we obtained a bound of O(N / log log N) for the quantity 
r3([1, N]). The main reason for this double logarithm lies in the use of Kronecker’s 
theorem in Lemma 10.25, which reduces the size of the progression P by roughly 
a square root, while only increasing the density by a small amount O(67). This 
step is so inefficient that it is worthwhile to make the other parts of the argument 
more complicated in order to reduce the number of times one invokes Kronecker’s 
theorem. One such approach, due to Heath-Brown and Szemerédi, is to apply 


390 10 Szemerédi’s theorem for k = 3 


Kronecker’s theorem to a large batch of frequencies at once, rather than one at a 
time. It yields the following improvement! : 


Theorem 10.27 [177, 344] For all large N, we have r3(Zy), r3({1, N]) = 
O(N/ log® N) for some absolute constant c > 0. 


Proof It suffices to verify the claim for r3([1, N]). We refine the arguments in 
the preceding section, again skipping some details. First we need the following 
variant of Proposition 10.24. 


Proposition 10.28 (Lack of progressions implies non-uniformity) Let A C 
[1, N] be such that |A| = ôN for some O < 6 < 1 and such that A has no proper 
arithmetic progressions of length 3. Suppose also that N > 100/58. Let p be a 
prime between N and 2N, and identify [1, N] witha subset of Zp. Let fy : Zp > R 
be the function fy := 14 — 61,1,n). Then there exists a set S C Zy such that 
IS] = O(6-?) and 

DIOP = 2(8"15|!"°). 

EES 


Proof Write fy1 := 51,1,y;. Arguing as in Proposition 10.28, we conclude once 
again that 


XO fu @ Pi fu(—28)| = 206°) 


EEZp 


or something very similar to this. A direct calculation (which we leave as an 
exercise) also shows that 


D fu? = 06°) (10.13) 


écZ, 


and hence by Hölder’s inequality we have 


D fu @P = 266°). (10.14) 


EeZ, 


Now suppose for contradiction that 


DIAO < cës 


EES 


for all sets S and some small c > 0 to be chosen later. Applying this in particular 
to the set S = {E : f y(&) > A} for some arbitrary parameter A, we see that 


MIE: fulE) = AH < c HE : fue) = ay’? 


1 We thank Ben Green for presenting these arguments to the authors. 


10.4 Quantitative bounds 391 


and hence 
Es Fu) = A Ss P8779, 


Multiplying this by 3A? and integrating we obtain 
3 
3 fe IPVO = aH? dà = O(6"8). 
0 


But one can easily verify (e.g. using (4.15)) that | f y(&)| < ô, and so the left-hand 
side simplifies to Ye eZ, IÊ u (£)|?. But this will contradict (10.14) if c is sufficiently 
small. The claim follows. 














Now we need the following variant of Lemma 10.25. 


Lemma 10.29 (Non-uniformity implies density increment) /287] Let N and 
p be as in the preceding lemma. Let f : Zp —> R be a function supported on [1, n] 
such that | f (n)| < 1 for all n, and such that 


VIO zo (10.15) 
EES 
for some set S C Z, and some o > 0. Then there exists a proper arithmetic pro- 
gression P' C [1, N] with |P'| = Q(oN"SI+)) and 
lEner f(n)| = Q0). 


Proof By Kronecker’ s theorem, we can find 1 <r < NT E such that 
lréllr;z < N` BH for all € S. Let Q be the progression Q = [1, NIA /10]-r, 
then a simple computation shows that 


1 
p0 2®l = @(1) forall £ € S. 


In particular, from (4.2), (4.9), (10.15) and the previous line we have 
2 


e 








= Y gg POPOP = 20). 


L?(Z) &cZp 


1 
———— 1 
Pz,(Q) ° 


On the other hand, from the boundedness of f we have 


























1 1 
mol, 2 ele Ol el 
Hence by Hölder’s inequality we have 
Et = Qa). 
P,Q) ea 


392 10 Szemerédi’s theorem for k = 3 


Thus there exists x € Z such that 


IEyex-o f(y) = Q(0). 
Setting P’ = [1, N] A (x — Q), the claim follows. 








The rest of the proof is similar to the arguments in the previous section and is 
left as an exercise. 














A further refinement was achieved by Bourgain [39], dispensing with the need 
for Kronecker’s theorem altogether. The idea was to avoid using arithmetic pro- 
gressions, but work entirely with Bohr sets, and in particular with regular Bohr 
sets. As a consequence, the following result was obtained, which seems to be very 
close to the limit of the Fourier-analytic method (it is in some sense the natural 
generalization of Proposition 10.12): 


Theorem 10.30 (Roth-Bourgain theorem) For additive groups Z of large finite 


odd order, we have r3(Z) = O( eee |Z). In particular for all large N 


r3(Zy), 73(L1, NI) = OG/ PERY N). 


log N 
This theorem follows easily from the following variant, which can be viewed 


as a generalization of Theorem 10.17: 


Theorem 10.31 For all additive groups Z of large finite odd order, and all f : 
Z > Rt with0 < f(x) < 1, we have A3(f, f, f) = UEz( f) 2 E0). 


We leave the deduction of Theorem 10.30 from Theorem 10.31 as an exercise 
to the reader. To prove Theorem 10.31, the main tool shall be the following result, 
which is a substitute for Corollary 10.26. 


Proposition 10.32 (Lack of progressions implies density increment) Let Z be 
an additive group of large odd order, let Bohr(S, p) be a regular Bohr set of rank 
d, and let f : Z —> R* be such that0 < f(x) < 1 and Exexo+Bohr(S,p) f (x) = 6 for 
some xo E€ Z. Suppose also that 


5 \ 100 d 
AKA AS (a) o) 


Then there exists a regular Bohr set Bohr(S’, p’) of rank at most d + 1 and radius 
p Š Ip and an element xj € Z such that 


Ey ex) +Bonr(s’,p) f(x) > 5+ re aes 


The deduction of Theorem 10.31 from Proposition 10.32 is also straightforward, 
and is left as another exercise to the reader. 


10.4 Quantitative bounds 393 


Proof By translation we may take xo = 0. By increasing ô if necessary we may 
assume Ey cponr(s,p) f (x) = 5. By reducing f to zero outside of Bohr(S, p), we may 
assume that f is supported on Bohr(S, p). Now suppose for sake of contradiction 
that 


Ex ex +Bonns’,p) f(x) < 6+ 67/2" (10.16) 


for all x € Z and all Bohr sets Bohr(S’, p’) of rank at most d + 1 and radius at 
least (4,)°"p. 
By Lemma 4.25 we can find 0 < p3 < p2 < pı < p such that for each j = 


1, 2, 3, we have! 
g \ 10i+1 5 \ 10 
ae pe a 
(=) p<o=(3) pP 


and that Bohr(S, oj) is regular. Note that the sets Bohr(S, o), Bohr(S, p1), 
Bohr(S, 2), Bohr(S, p3) will differ in size by factors of 5°, which will be too 
large for our application. Hence we shall have to keep careful track of the densities 
of each of these Bohr sets separately. 

By hypothesis and a change of variable, we have 


100 \4 
Ey rez f(x —nN) fof +r=Aslf f, n= ((5) o) 


in particular, from (4.25) we have 


53 
Ey rez f(x —Nf@fatns gez BCS, p)Pz(Bohr(S, p1)) 


(say). Since f is non-negative, we can localize r to Bohr(S, p1) and conclude 


83 
Eyez:reBonr(s py f& —NFOFA +r) S gPzBohCS, p)). (10.17) 


Write fy. := ôlBohr(s,o)- From the symmetry of the above expression in r, one can 
verify the identity 


EzcZ:reBoh(S, p1) fX =r)f x) fx +r) 
= Exez:reBohr(S,o,) fu X = r)f (x) fur + r) (10.18) 
+ Eyezreponns,oy(f — fur —r) ff + fur) +r). 


1 The reader should not take the numerical quantities (especially the powers of 2) too seriously in this 
argument; they are certainly not optimal. 


394 10 Szemerédi’s theorem for k = 3 





Observe that if x € Bohr(S,o—,) and r € Bohr(S, p1), then x+re 
Bohr(S, o), and therefore 


E,<pohu(s,o) fue — FO) fur +r) = 8 f(a). 
Thus by positivity of f and fy 
Exezireponn(s,o) fur — 1) f(a) fur +r) > 8 Exez fB, p-p). 
By hypothesis we have 
Eyez f() lBonr(s,p)(") = 6Pz(Bohr(S, p)) 


while from regularity of Bohr(S, 0) we have 


ô 
Ezez f œ) Lponns,p—py(") < zP z(Bohr(S, p)) 


(say). Combining the above three estimates we obtain 


8? 
EzcZ:reBoh(S, o) fut xX > r)f (x) fur F r)lBoh(S, o) r) = z Pz@Bohr(s, p)) 


Combining this with (10.17), (10.18) we conclude that 


83 
|ExeZ;reBohs, o) (f — fur =r) FOF + fu) +r) 2 g PzP, p)); 


we shift this by r to obtain 


8? 
|Evez:reBonr(s,ay(f — fuf Œ + NGF + fu) + 2r)| > giz Bons, p)); 


We would like to use this fact to deduce some linear bias in f — fy1. Unfortunately 
the constraintr € Bohr(S, p1) is not favorable (it localizes r to a smaller scale than 
x). To resolve this we need to localize the x variable to a smaller scale, namely p2. 
To do this we write x = y + z where z is restricted to Bohr(S, p2), and conclude 
that 


|Eyez:reBohr(S,p1):zeBohr(S,n)(f — fury +2 fy +z +r f+ fury + z+ 2r)| 
53 
> q PzBohr(s, p)). 


Observe that we may localize y to Bohr(S, o + p2) since the expression inside the 
expectation vanishes otherwise. Since f is bounded and Bohr(S, p) is regular, the 
contribution of Bohr(S, o + o2)\Bohr(S, p) can be crudely bounded by 


8? 
Pz(Bohr(S, p + p2)) — Pz(Bohr(S, p)) < gr eons; Pp — p2)) 


10.4 Quantitative bounds 395 


(say). Thus we can restrict y to Bohr(S, p — p2) and use the triangle inequality to 
obtain 


8? 
EycBonr(S,p—p2) FO) = g 


where 
F(y) := |Ereponr(s,1):zeBonr(s,.)(F — fury + Of +z +r + fury +z +2r)l. 


Now that the position variable z is localized to a smaller scale than the shift variable 
r we may now remove the shift restriction r € Bohr(S, p1) as follows. We rewrite 


|EzeBonís, o) Erez Bons, OF — fu tof + 24+ NF + fu) +z+2r)| 


F = 
(y) Pz(Bohr(S, p1)) 





Now note that for each fixed y and each fixed z € Bohr(S, p2), the function 


Lponr(s,o.)(7) — 1y+Bonr(s,p)(y + Z +1) 12-Bohr(s,o)(y + z + 2r) 


has an L'(Z) norm in ther variable of at most Pz(Bohr(S, p1 + 202)\Bohr(S, 01 — 
22)), which by the regularity of Bohr(S, p1) will be at most Č Pz(Bohr(S, 1). 
Using this and the boundedness of f, we see that if we write 


i 1 
= Be | EzeBohr E, 
L Pz(Bohr(S, Al zeBohr(S,o2)4reZ 
1y4Bohr(S o) + Z +1) 12Bonns,o)(9 + z+ 2r) 
(fF — fut DfO+24+NEF + fu +z + 2r)| 


_ IA3((f — fu+)1y+Bonr(s,o.)> £1y+Bonrs,o.)» (Ff + fu+)1y+2-Bonr(s,p1)) 
Pz(Bohr(S, p1)Pz(Bohr(S, p2)) f 





then F(y) and F (y) differ by at most 5°/16. In particular we have 


3 


Pe ô 
Ey cBonr(s,p—p) FO) = 16 (10.19) 


At this point we need to pause to address a technical issue, namely that the function 
(f — fu+)1y+Bonrs,o.) may have non-zero mean. Fortunately this can be dealt with 
by the first moment method. Let G(y) denote the function 


G(y) = Exey+Bohr(S,o)(F ae Su+). 


Since f and fy. range between O and 1 and have the same mean, we see that 
G is bounded in magnitude by 1 and has mean zero. Also, G(y) vanishes when 
y € Bohr(S, o + p2), while from (10.16) we see that G(y) is bounded above by 


396 10 Szemerédi’s theorem for k = 3 


8? /210 when y € Bohr(S, o — p2). Since Bohr(S, p) is regular, we thus see that 


8? 
E,<z max(G(y), 0) < 5īgPz@ohr(S, p — p2)) 


F TONG P + p2)\Bohr(S, p — p2)) 
82 
< zg Pz (Bohr(s, p — p2)). 


Since |G(y)| = G(y) + 2 max(G(y), 0), we thus have 
2 


78° 





E Exez|G(y)| < 


xX onr ra G = 
a A T TA 


we can combine this with (10.19) to obtain 


3 


= ô 
E, <Bohr(s,p—p) FO) — 881G) = 33 


and thus there exists y € Bohr(S, o — p2) such that 


8? 
F(y) > 85|G =. 

O) 2 851GQ) + z7 

We fix this y and return to the analysis of F (y). From Proposition 10.11 we have 
1 

F 1 onr u 
O) < P;(Bohr(S. p)Pz(Bohr(S. 22) CF — fu+)1y+Bonr(s,p2)lu2(z) 
x Il f1ysBonr(s.yllzzllF + fu+)1y+2-Bonr(s,o) ll x2(z)- 


From (10.16) we have 


Il f lyBonns,pllz2¢z) < 28Pz(Bohr(S, p1))) 





and 


If + fur)ly+2-Bonms,pyllzaz) < 85Pz(Bohr(S, p1))). 


Thus we have 


46 AN 
F(y) < = P;(Boht(S, p)) a If — fu+)1y4Bohrs,)1 ()I. 


Thus there exists £ € Z such that 


1 62 
Pohs, z! (f — fu+)1ysponrs,o,)]*()| = 2G + => 108" 


Since y € Bohr(S, o — p2), we have fy = 6 on y + Bohr(S, p2). We can there- 
fore find a phase 6 € R/Z such that 





2 


ô 
ReEycy+Bonr(s,p)(f (x) — de(—§ -x + 0) = 2|Exey+Boh(s, (f — 8) + Tag 


10.4 Quantitative bounds 397 


In particular, by the triangle inequality we have 
2 


ô 
Exyey+Bohr(S,p2)(f Œ) — 6)[2 + Re e(—§ - x + 0)] > Tg 
The only remaining task is to eradicate the multiplier 2 + Re e(—&-x +90). 
This shall be done by replacing the Bohr set Bohr(S, o2) with the narrower one 
Bohr(S’, 03), where S” := S U {£}. Writing x = w + z where z € Bohr(S’, 03), we 
see that 


Ewez:zeBohr(s’,03) | Bohr(S,p.)(w + Zf (w + z) — ô) 
82 
[2 + Re e(—& - w + 8)e(—Ẹ - z)] = qag Z BOBS, p2)). 


Since z € Bohr(S’, 03), we have |e(—é - z) — 1| < 27 p3 by (4.24). It is then easy 
to replace e(—é - z) by 1 incurring an error of at most = P,(Bohr(S, p2)) (say), 


312 
concluding that 
EweZ:zeBohr(S', o3) |Bohr(S,p,)(w + Zf (w + z) — ô) 
362 
[2 + Re e(—§-w+0)] = 51a FZ Bons, p2)). 


A similar argument (exploiting the regularity of Bohr(S, 2)) allows one to replace 
the cut-off 1Bohr(s, o) (W + z) by lBon(s, op) (wW), to obtain 


EweZ:zeBoh(S, o) LBohr(s, o) (Wf (w + z) p ô) 
82 
[2 + Re e(—& - w + 0)] = -zg Pz(Bohr(S, p2) 


which we rewrite as 
2 


ô 
EweBoh(S, o) l2 F Re e(—& swt 0) (Ey ew+Bohr(s’,p3) f (x) = ô) 2 256° 


On the other hand, from (10.16) and the bound 2 + Re e(—& - w +0) < 3 the 
left-hand side is bounded by 3 a, a contradiction. 














Exercises 


10.4.1 Prove (10.13). 

10.4.2 Complete the proof of Theorem 10.27 given Proposition 10.28 and 
Lemma 10.15. 

10.4.3 Deduce Theorem 10.30 from Theorem 10.31. 

10.4.4 Deduce Theorem 10.31 from Proposition 10.32. (Hint: use an iter- 
ation argument with about O(1/Ez(/f)) steps, with parameter sizes 
ô = Q(Ez(f)), d = O(1/Ez( f)) and p = Q(Ez(f)°"/E) through- 
out the iteration.) 


398 10 Szemerédi’s theorem for k = 3 


10.5 An ergodic argument 


In 1977, Furstenberg [121] gave a spectacular new proof of Szemerédi’s theorem 
(and hence Roth’s theorem), using the methods of ergodic theory rather than Fourier 
analysis or combinatorics. The argument relies on very little arithmetic structure, 
being based almost entirely on an analysis of the mixing properties of the shift 
operator TA := A + 1 on a set A of integers. As such it is very flexible and has led 
to several wide-ranging generalizations of Szemerédi’s theorem, some of which 
we will discuss in the next chapter. 

The initial ergodic arguments of Furstenberg were infinitary in nature, working 
with the integers Z, and in fact embedding these integers in an abstract measure- 
preserving system (X, B, T, u). In several versions of the argument, the axiom 
of choice (in the guise of Zorn’s lemma) was used to obtain a suitable structural 
decomposition of this measure-preserving system. More recently, however, there 
has been progress in establishing finitary versions of this argument, in which one 
works in a concrete and finite measure-preserving system, such as the cyclic group 
Zy with the standard shift TA := A + 1. These finitary arguments, which were 
inspired by the Szemerédi regularity lemma, to be introduce in the next section, are 
somewhat messier than the elegant infinitary arguments, but lead to explicit (albeit 
poor) quantitative bounds for r(Zy ). Also these finitary ergodic arguments played 
an essential role in the proof of the Green—Tao theorem concerning progressions 
in the primes. 

In this section we give a finitary ergodic proof of Roth’s theorem, using a 
formulation from [358]. The proof is not fully ergodic because we shall exploit 
the Fourier transform, but in the next chapter we will discuss how one can remove 
this dependence on the Fourier transform (and thus extend the argument to higher 
k). The precise result we shall establish is 


Theorem 10.33 For all finite groups Z, and all f : Z —> Rt withO < f(x) <1 
and Ez(f) = ô, we have A3(f, f, f) = Os(). 


This is of course weaker than what one can obtain by purely Fourier-analytic 
methods such as Theorem 10.31, but the proof is somewhat different and is easier 
to extend to higher k. In particular, it replaces the density increment argument 
of previous sections by an energy increment argument. Whereas in the previous 
arguments one constructed a series of objects (progressions or Bohr sets) on which 
f had increasingly large density, here we construct a series of o-algebras or 
partitions with respect to which f has increasingly large energy. This eventually 
leads to constructing a “low-complexity” approximation fy. to f, where the error 
fu := f — fuy- is linearly uniform and thus has negligible impact on A3(f, f, f). 


10.5 An ergodic argument 399 


The low-complexity approximation fy. turns out to be almost periodic, which 
will lead to a lower bound on A3(fy_, fy+, fy+). 

We turn to the details, beginning with the definition of almost periodicity. For 
convenience we shall take advantage of the Fourier transform to define this notion, 
though it is not essential (see exercises). 


Definition 10.34 (Almost periodicity) Let K > 1 be an integer and o > 0. 
We say that a function f : Z —> C is K-quasiperiodic if there exist frequen- 
cies &,...,&x E Z (possibly repeated) and complex numbers cy,..., cx with 
Ici|,---, [ex] < 1 such that f = aa cjé¢,, or in other words 


k 
f(x) = > cye(x - £;). 
j=l 
We say that a function f : Z —> C is (K, o)-almost periodic if there exists a K- 
quasiperiodic function g such that || f — gllz2z) < o. 


A key observation is that Theorem 10.33 is easy to prove for almost periodic 
functions, if K is not too large and o is sufficiently small. More precisely, we have 


Proposition 10.35 (Almost periodic functions are recurrent) Let f : Z > Rt 
be such thatO < f(x) < landEz(f) > 6. If f is (K, o)-almost periodic for some 
K > landO <o < 8?/8, then 


Aa(f, F, f) = 2((5/K)* 8). 


This proposition should be compared with Lemma 4.44. A key point here is 
that the smallness condition on o does not involve K. This will be important for 
us as K will eventually be quite large compared with o. 


Proof By definition, we can find frequencies &,...,&x and coefficients 
C1, ..., CK Of magnitude O(1) such that 
K 
f(x) = Yo cjela- £j) + ga) 
j=l 
for all x € Z, where g has an L?(Z) norm of at most o . Now let S := {&,...,&x} 


and let p > 0 be a radius to be chosen later. If h lies in the Bohr set Bohrz(S, p), 
then e(h - €;) = 1 + O(p), and hence 


TI F(x) = f(x) + O(Kp) + T” g(x) 


for j = 1,2, where T” f(x) = f(x + h) denotes the shift by A. In particular we 
have 


IT” f — flrzz) < O(Kp) + 20, 


400 10 Szemerédi’s theorem for k = 3 


while from the boundedness of f we have ||T/" f || iz) < 1. After a few applica- 
tions of the triangle inequality and Hölder’s inequality, we then conclude that 


IAT fT" f) — fll < O(Kp) + 40 
and hence by the triangle inequality again 


Eyez f(x)T" f(x)T™ f(x) = Exez(f(x)*) — O(Kp) — 40. 


On the other hand, from Hélder’s inequality we have 


Eyez f(x) > Exez(f(x)y = 8 


so by hypothesis on o 
h 2h l3 
Erez fT fT fæ) = 58 — O(Kp). 


Applying (4.25) and the positivity of f, we conclude that 


1 
Aah, F, P) = Ex nez FT" FT” fœ) = p% max (5° — O(Kp), 0) l 











The claim then follows by taking p to be a sufficiently small multiple of 6/K. 





To establish Theorem 10.33 in the general case, one now needs to approximate 
an arbitrary function f by an almost periodic one. Indeed we will establish the 
following fundamental proposition: 


Proposition 10.36 (Koopman-von Neumann decomposition) Let f : Z > Rt 
be such that0 < f(x) < 1, leto > 0, andlet F : R* x R* —> R* be an arbitrary 
function. Then there exists a quantity K = Og,r(\) and a decomposition f = 
tus + fu with the following properties: 


e the “anti-uniform” component fy. obeys the bounds O < fy: < 1 and 
Ez fu- = Ez f, and is (K, o)-almost periodic; 
e the “uniform” component fy obeys the Fourier uniformity estimate 


1 
Il fully S For 


A remarkable feature of this proposition is that one can make the uniformity 
control on fy arbitrarily strong by making F grow arbitrarily quickly. The price 
one pays for this is that the upper bound on K then deteriorates substantially. 

We shall prove Proposition 10.36 in the rest of this section. For now, let us 
see how the proposition implies Theorem 10.33. We apply the proposition with 
o := 6°/8 and F to be chosen later. From Proposition 10.35 we have 


A3(fus, fur, fur) = 2(5/K)* 8). 


10.5 An ergodic argument 401 


Since f, fy- are bounded between 0 and 1, fy is bounded in magnitude by 1. 
Applying the Fourier uniformity estimate and Proposition 10.11, we conclude that 


1 
M S P) Mays fur, fui) = O Cas) 


Thus if we choose F to be sufficiently quickly growing, we can absorb the error 
term into the main term and conclude that 


Alf, F, F) = Q((8/K)* 6°). 


Since K = O,,r(1) = O5(1), the claim follows. 

It remains to prove Proposition 10.36. One can prove this proposition by a 
direct application of the Fourier transform (this is essentially the approach in 
[34]); however we shall use a more ergodic approach which extends more easily to 
progressions of longer length. A crucial tool here is that of conditional expectation. 


Definition 10.37 (Conditional expectation) Define a o-algebra of Z to be any 
collection B of subsets of Z which contains Ø and Z, and is closed under unions, 
intersections, and complements. (The o -algebras are in one-to-one correspondence 
with partitions of Z, and can be viewed as such.) If B, B’ are two o-algebras, we 
define B v B’ to be the smallest o-algebra which contains both. We say that a 
function f : Z — C is measurable with respect to B if it is constant on every 
atom of 6, where an atom is any minimal non-empty element of B. Given any 
f :Z-— C, we define the conditional expectation E(f |B): Z — C to be the 
function 


1 
E(f|B)(x) := Esa f = ST fy) 
©" BO] 2 
where B(x) is the unique atom of 6 which contains x; equivalently, E( f |B) is the 


orthogonal projection in L?(Z) to the space of B-measurable functions. 


It turns out that certain o-algebras B are “compact” in the sense that condi- 
tional expectations such as E( f |B) are automatically almost periodic. One precise 
formulation of this is 


Proposition 10.38 (Characters generate compact o-algebras) Let £ € Z and 
0 <€< 1. Then there exists a o -algebra B, with O,(1) atoms which approxi- 
mately contains the character eg(x) := e(& - x) in the sense that 


lee — E(es|Be £ )llzæz) = OC), (10.20) 


and also has the property that every B,,~-measurable function f with || f \|z~(z) < 1 
is (Oc,¢(1), O(o))-almost periodic for every o > 0. 


402 10 Szemerédi’s theorem for k = 3 


Proof We use the first moment method. Let a be a randomly selected element of 
the unit square Q := {z € C : 0 < Re(z), Im(z) < 1}, and let B,,z be the o -algebra 
generated by the sets 


Aab sa = {x E Z : e(x)€e(Q+at+bi+a);a,b € Z. 


These sets, which partition Z, are essentially translates of the Bohr set 
Bohrz({&}, €); at most O(1/e) of them are non-empty. Since eg fluctuates by 
at most O(e) on each such set, we obtain the property (10.20). Now we prove the 
latter property. Observe that f is a linear combination of at most O(1/¢) indicator 
functions 14,,,., 
the O(1/e) non-trivial indicator functions 1,,,,,. 
and by approximating o by the nearest power of 2 we thus see that it suffices to 
verify the claim for o = 2~” for integer n > 0. By the Borel—Cantelli lemma it 
will thus suffice to show that 


with bounded coefficients, so it suffices to prove the claim for 
The claim is trivial for o > 1, 


Pil, is (Oc n(1), O(2"))-almost periodic) = 1 — O(e2™) 


a,b,e,a 


foreachn > 1 anda,b € Z. 
Fix n, a, b. We rewrite 


lhasse) = lo (= Ah bi a). 


Let B be the ¢2~>"-neighborhood of the boundary of the square Q. From Urysohn’s 
lemma followed by the Weierstrass approximation theorem, we can write 





lolz) = Pae(z) + OCB) + O22"), 
where Pa, e(z) is a polynomial of z and z depending only on n. We conclude that 


LAgnea(*) (10.21) 





op (8) . 2 
=\Pre 7 a— bi — a| +0(I(e(x - £))/e €a+bi+aeB)+02"). 


The first term on the right-hand side can be easily verified to be O,,-(1)- 
quasiperiodic. An application of the first moment method easily shows that 


E(([I(e(x -€)/e € a + bi +a € Blizz) = O12), 


so by Markov’s inequality we see that the second term in (10.22) has an L?(Z) 
norm of O(2~") with probability 1 — O(e2~"). We thus see that 14 is 


a,b,e,a 


(On,¢(1), O(2~"))-quasiperiodic with probability 1 — O(¢2~"), as desired. 














One can extend this to the o-algebra generated by multiple characters: 


10.5 An ergodic argument 403 


Corollary 10.39 Let &1,..., En € Z and &,...,& > 0. Let B := Bye, VV 
Be, where Bez was defined in the previous proposition. Then every B- 
measurable function f with || f lisz) < 1 is (Oe, e,n o(1), On(o))-almost peri- 
odic for every o > 0. 


Proof Observe that B has at most Oy, ¢,,...,.c,(1) atoms and so it suffices to verify 
the claim for an indicator f = 14, where A is an atom of B. But 14 is then a 
product of n indicators 14, ...1,4,, where A; is an atom of Bee, and the claim 
then follows from the previous proposition and the observation that the product 
of bounded almost periodic functions remains bounded and almost periodic (but 
with slightly worse constants). 














The heart of the proof of Proposition 10.36 now lies in the following key lemma. 
We define the energy Ep (B) of B with respect to f to be the quantity 


EB) = |IECfIB) Ijaz) = ExezlE(f|B)(x)/’. 


Lemma 10.40 (Lack of uniformity implies energy increment) Let £, 4 > 0 
be such that £ < 4/4, and let f : Z — R* be such that 0 < f(x) < 1, and let B 
be a o -algebra such that 


If — ECFB)lluacz) 2 u- 
Then there exists a frequency £ € Z such that we have the energy increment 
property 
E(B V Bee) = E(B) + w?/4. 
Proof By definition of u? (Z), we can find £ € Z such that 
IS — EIB), e)ra l 2 u. 


On the other hand, from (10.20) we have we see that eç fluctuates by at most 2e 
on each atom of B, ¢, and hence on each atom of 6 v B.. Thus 


lleg — E(eg|B V Be ¢)|lt~(z) < 28; 
since f — E(f |B) is bounded in magnitude by 1, we conclude 
KF — ECf|B), ee — Eleg|B V Be¢)) 12(z)| < 2e. 
Since £ < u/4, we deduce 
KS — E(f1B), E(es|B V Bee)) r2zl = u/2. 
From the easily verified identity 


(f — E(fIB V B..¢), E(ee|B V Bee) rz) = 9 


404 10 Szemerédi’s theorem for k = 3 


we thus have 
IEC IB V Bee) — ECS IB), E(eg|B V Be.e)) 121 2 w/2 
and hence by Cauchy—Schwarz 
E(fIB V Beg) — Ef IB) Ilia) Z 07/4- 


The claim then follows from Pythagoras’ theorem. 














We now have enough tools to prove Proposition 10.36. 


Proof of Proposition 10.36 We construct a nested pair of o -algebras B C B’ and 
an integer K > 1 by the following double-loop algorithm. 


e Step 0. Initialize B = {ø, Z}. 

e Step 1. Let K be the smallest integer such that E(f |B) is (K, o /2)-almost 
periodic. (Note from the Fourier inversion formula that K is finite.) Set 
B' := B; thus we trivially have €/(B’) < E(B) + o?/4. 

e Step 2. If 

1 
— E(fIB' i e ee ee 
If -ESB ae = Few 
then we terminate the algorithm. If not, then we can apply Lemma 10.40 with 


aS to obtain a new o-algebra B” := B’ v B, g for some £ € Z such 
that 


1 
4F(0,K) 


E(B )= E(B) + 4F(c, KY 


e Step 3. If we have 
E,(B") < E(B) +.07/4 
then we set B’ := 6” and return to Step 2. If instead we have 
E(B") > E(B) + 0°/4 
then we set B = B” and return to Step 1. 


Observe that every time we return from Step 3 to Step 2, the energy €;(B’) 
z rE Ky? while K does not change. On the other hand, since 
f is bounded, E; (8') varies between 0 and 1. Thus we can only return from Step 3 
to Step 2 at most 4F (o, K}? times before either terminating or returning to Step 1. 
Now, every time one returns from Step 3 to Step 1, the energy € ¢(B) increases by at 
least 4/07, so one can only return from Step 3 to Step 1 at most 4/a7 times. Thus this 
algorithm terminates after a finite number of steps. If we then set fy := E(f|B) 
and fy := f — E(f|B’) we have f = fu + fu, that || fulluez < s that 


increases by at least 





10.5 An ergodic argument 405 


0 < fyi < 1, and Ez fy. = Ez f. Finally, from construction we have €;(6’) < 
E(B) + o*/4 and hence by Pythagoras’s theorem || fy. — E(f|B)|lzz) < 0/2. 
Since E(f |B) is (K, o/2)-almost periodic by construction, we conclude that fy. 
is (K, o)-almost periodic. 

The only remaining thing to verify is that K = O, (1). Observe that at every 
stage, B and B’ are the join of a finite number of o-algebras of the form B,.¢. In 
particular, Corollary 10.39 applies to these o-algebras. An easy induction argument 
then shows that at every stage of the iteration, 6 and B’ are the join of at most 
Os,F(1) o-algebras, that the parameters ¢ involved are bounded from below by 
Qo,F (1), and the parameter K is always bounded above by O,,r(1). The claim 
follows. 














Exercises 


10.5.1 Let f, g: Z— C be functions bounded in magnitude by 1 which are 
both (K, o)-almost periodic for some 0 < ø < 1. Show that f + g is 
(2K, 2c)-almost periodic, and that fg is (K?, 40 )-almost periodic. 

10.5.2 Let f : Z — C be (K, o)-almost periodic. Show that one can cover the 
set {T} f :h e Z} c LZ) by at most Ox,,(1) balls of radius 2ø in the 
L?(Z) metric. Conclude that 


Prez(IIT" f -= flle < 40) = 0gs) 
which may help explain the terminology “almost periodic”. For a converse 
to this result, see Exercise 10.5.5 below. 
10.5.3 Let &,...,&, be a dissociated subset of Z. Using Rudin’s inequality 
(Lemma 4.33), show that 


j=l 


Phez (x le(§; -h)— 1? < n) < exp(—Q(n)). (10.22) 


10.5.4 Let f : Z > C be such that || f llzz2 = If lez) = 40 and || Îl=(Z) < 
do for some o, 6 > 0. Establish the bound 


Prez 2 leé -h)— IPI FE)? < 1) = 0(6°) 


EEZ 


for some absolute constant c > 0. (Hint: normalize || f liz) = 1, s00 < 
1/4, and then let £1, ... , En be independent identical random variables 
with probability distribution \f (€)|*. Show that £1, . . . , En are dissociated 
with probability 1 — O(2”8), and apply (10.22) combined with the first 
moment method. Then optimize in n.) 


406 10 Szemerédi’s theorem for k = 3 


10.5.5 Let f : Z — C be normalized so that || f || zzz) = IÊ = 1. Suppose 
that one can cover the set rey :h € Z} c IFA) by M balls of radius 
o in the L?(Z) metric. Show that f is (Om,o(1), 40)-almost periodic. 
(Hint: use the pigeonhole principle and the Fourier transform to establish 
a lower bound for 


Prez (y je -h) — IPI FE)? < to) 
EEZ 

Remove the K largest Fourier coefficients from f , for some K = Om,o(1) 
to be chosen later, and apply the previous exercise to conclude an upper 
bound on the /* norm of the remaining Fourier coefficients.) This result, 
combined with Exercise 10.5.2, gives a way to define almost periodicity 
purely in terms of the precompactness of the orbit {7” f : h € Z}, without 
explicit mention of the Fourier transform. 


10.6 The Szemerédi regularity lemma 


In the original proof of Szemerédi’s theorem (Theorem 10.1), Szemerédi intro- 
duced an important result in graph theory, the Szemerédi regularity lemma. This 
lemma has since become one of the main tools in discrete mathematics. It asserts, 
roughly speaking, that any dense large graph can be decomposed into a relatively 
small number of disjoint subgraphs, most of which behave pseudo-randomly. A 
more “ergodic” way of viewing the lemma is as an assertion that the indicator 
function of a graph can be decomposed into a “low-complexity” component and a 
“pseudo-random” component. 
To state the lemma, we need some notation. 


Definition 10.41 (c-regularity) Let G(V, E) bea graph. If X, Y are disjoint non- 
empty subsets of V, we define the edge density d(X, Y) between X and Y to be 
the quantity 


d(X, Y) := Prex,yer({x, y} € E). 
If € > 0, we say that the pair (X, Y) is €-regular if we have 
|\d(X', Y')— d(X, Y)| < € 
whenever X’ C X, Y’ C Y are such that |X’| > «|X| and |Y’| > e| Y|. 


A partition V = Vi UV, U---U Vk is near-uniform if —1 < |Vi| —|Vj| < 1. 
Szemerédi’s Regularity Lemma asserts that given a positive constant € and a graph 
G, one can find a near-uniform partition of V in not too many parts so that most 
of the pairs (V;, V;) are €-regular. 


10.6 The Szemerédi regularity lemma 407 


Lemma 10.42 (Regularity Lemma) Let € be a positive constant, m > 1 an integer, 
and G = G(V, E) a graph. If |V| is sufficiently large depending on £ and m, 
then there exists a near-uniform partition V = V, U---U Vx for some m < k < 
Oe, m(1). such that all but at most ek? of the pairs (V;, V;) are €-regular. 


Remark 10.43 The Regularity Lemma does not assert that all pairs (V;, V;) are 
regular, only that (1 — £) of the pairs are. In fact, there are examples showing that 
one cannot expect regularity of all the pairs (Exercise 10.6.5). 


Remark 10.44 The theorem requires |V| to be large depending on £ and m, or 
to put it another way, one needs ¢ = Ojy\-,o0:m(1). The proof of the Regularity 
Lemma allows us to have € = On (egy) where log, is the inverse to the tower 
exponential n +> e t*4 n, defined recursively by e t¢ 1 = e ande tt (n + 1) := 
exp(e î n). Quite amazingly, Gowers shown that this bound is essentially tight, 
namely, for any sufficiently large |V |, there are graphs where one cannot find an 
€-regular partition with e larger than € = TAC 

The proof of the regularity lemma can be found in various textbooks on graph 
theory; in Section 11.6 we shall give a proof of this lemma using “ergodic” tech- 
niques similar to that of the previous section. See also [359] for an information- 
theoretic perspective on the lemma, and [239] for an analytic perspective. 

The survey paper [208] contains a wide range of applications of the regular- 
ity lemma. In this section, we restrict ourself to a few applications in additive 
combinatorics, and in particular to Roth’s theorem. 

To prove Roth’s theorem via the regularity lemma, it is convenient to first prove 
some graph-theoretic results. Let G = G(V, E) be a graph. A set {e,..., ex} in 
E forms a matching if e1, ... , eg are mutually disjoint. A matching is induced if 
the subgraph spanned by its endpoint does not contain any edge other than those 
already in the matching. 


Proposition 10.45 [304] Let G = G(V, E) bea graph whose edge set is the union 
of |V| induced matchings. Then | E| = ovil VIA. 


Proof The strategy will be to apply the regularity lemma, combined with the intu- 
itive fact that a dense £-regular graph cannot support any large induced matchings. 

Assume that the proposition failed. Then one could find an integer m > 1 and 
arbitrarily large graphs G(V, E) with |E| > £v? (say) such that each of the 
graphs G was the union of |V | induced matchings. 

Fix one of these large graphs. Applying the regularity lemma (with £ := 1/m) 
we obtain a partition V = V; U--- UY withm < k < O,,(1) with |V;| = HIVI + 
O(1) for all i, j, and such that all but at most 4K? of the pairs (V;, V;) are Ls 


m 
regular. 


408 10 Szemerédi’s theorem for k = 3 


Call an edge e of G bad if one of the following three events occurs: 


e eis contained in one of the V,; 
e e connects V; to Vj, where d(V;, Vj) < 2; 


e e connects V; to Vj, where (V;, Vj) is not + -regular. 


One can easily verify that the total number of bad edges is at most 


IV|/k + OU) 2 (kV 1 |V? cee 
1 ee sae k < (Vi, 
(+ oy; wnt) (K( 2 T >) 2 ce 2 oma | 





if V is large enough depending on m. Thus if we let E’ C E be the edges of E that 
are not bad, we still have | E’| > živ |2. By the pigeonhole principle, we can thus 
find an induced matching F of G which contains at least živ] edges from E’. 
Call a set V; poor if it contains at most ZIV] vertices from F. If we delete all 
the poor sets V; (and their associated edges) from F, we will have deleted at most 
Ziv] edges in all. Thus the remaining matching F” will still contain an edge from 
E’. By definition, this edge connects two distinct sets V;, V; which are not poor, 
which have edge density at least 2, and is +-regular. If we let Vir and Vj, p be 


m?’ m 


the vertices from F in V;, V; respectively, we thus have 


1 1 
d(Vi r, Vir) = d(Vi, Vj) -— = —. 
mm 
On the one hand, since F is an induced matching, the number of edges in V; r 
and V; p cannot exceed |V; |, and so the edge density cannot exceed 1/|V;,-|. We 
conclude that 


|V; r| <m. 


On the other hand, we have |V; r| > ZIV (since V; is not poor) and 
IV = zIV| + O(1). We conclude that |V| = Om,k(1) = Om(1), contradicting 
the hypothesis that V could be arbitrarily large depending on m. The claim 
follows. 














There are several equivalent formulations of the above theorem; see the exer- 
cises. A slightly stronger version of the theorem is as follows. 


Lemma 10.46 (Triangle removal lemma) [304] Let G = G(V, E) be a graph 
which contains at most 5|V|> triangles. Then it is possible to remove 03-,0(|V |”) 
edges from G to obtain a graph which is triangle-free (it contains no triangles 
whatsoever). 


Lemma 10.46 can be proven by the same method used to prove Proposition 10.45 
and is left as an exercise. In fact one can easily use Lemma 10.46 to deduce 
Proposition 10.45. 


10.6 The Szemerédi regularity lemma 409 


Now we use Proposition 10.45 to give another proof of Roth’s theorem, Theo- 
rem 10.8. 


Proof Fix a finite additive group Z of odd order, and a subset A of Z which 
contains no arithmetic progressions. It suffices to show that |A| = 0)7)-.o0(|Z|). 
We define a bipartite graph G as follows. The color classes are the sets Z x {1} 
and Z x {2}. We draw an edge between (a + r, 1) and (a + 2r, 2) for every a € Z 
and r € A. For each a € Z, the edges between (a +r, 1), (a+ 2r, 2) for r € A 
form a matching. We claim that this matching is induced. For, if there was another 
edge connecting (a +r, 1) with (a+ 2s, 2) for some distinct r,s € A, then by 
construction we would have 2s — r € A. But then r, s, 2s — r would be a proper 
progression of length three in A, a contradiction. Thus G is the union of | Z| induced 
matchings, and hence has at most 0)z|-+00(|Z|) edges. Since the number of edges 
in G is clearly |A||Z|, the claim follows. 














In fact, the above methods yield the following stronger form of Roth’s theorem. 


Proposition 10.47 [3] Let Z be a finite additive group, and let A C Z x Z be 
such that A contains no “right-angled triangles” (a, b), (a, b + r), (a + r, b) with 
a,b,r € Z andr #0. Then |A| = 0\z|->00(|Z|). 


We leave the proof of Proposition 10.47 (and its connection to Roth’s theorem) 
to the exercises. 

It is of interest to obtain more quantitative bounds for the o() terms in the 
above results. By using an explicitly quantitative formulation of the regularity 
lemma, one can sharpen the ojyj-+.0(|V|*) expression in Proposition 10.45 to 
O(\V|?/dog, |V|)!/>), and similarly for Lemma 10.46, Roth’s theorem and Propo- 
sition 10.47. Thus the quantitative bounds achieved by this method compare poorly 
to that achieved by the Fourier method. (However, the graph-theoretical method is 
slightly easier to extend to the case of general k; see the next chapter.) Given that 
the bounds of Roth’s theorem are significantly better than what is achieved by the 
regularity lemma, one is then naturally led to ask the following question: 


Question 10.48 [139] Prove Proposition 10.45 (or Lemma 10.46) without using 
the Regularity Lemma. Find a better quantitative bound. 


In the case of Proposition 10.47, there has been some recent progress on this 
question [381], [314]. In particular, the best known bound here is A = o(—4_), 


log log |Z| 
due to Shkredov [314]. 


Exercises 


10.6.1 [304] Show that Proposition 10.45 is equivalent to the following state- 
ment: Let G(V, E) be a graph such that each edge is contained in at most 
one triangle. Then |E| = O\V|-+00(|V |). 


410 


10.6.2 


10.6.3 


10.6.4 
10.6.5 


10.6.6 


10.6.7 


10.6.8 


10.6.9 


10.6.10 


10 Szemerédi’s theorem for k = 3 


((6, 3)-theorem) [304] Show that Proposition 10.45 is equivalent to the 
following statement: let G = G(V, E) be a 3-uniform hypergraph (thus 
each “edge” in E is a collection {x, y, z} of three vertices in V) such that 
there is no set of six vertices in V which contain three or more edges in 
E. Then |E| = o\y}-+00(|V|?). 

[304] Show that Lemma 10.46 implies Proposition 10.45. (Hint: first 
reduce to the case of a bipartite graph which is the union of induced 
matchings. Add |V| additional vertices to the graph, one for each 
induced matching, and connect each new vertex to all the vertices in an 
induced matching. This creates a tripartite graph with rather few trian- 
gles, but which requires many edges to be removed in order to make it 
triangle-free.) 

[304]Use the regularity lemma to prove Lemma 10.46. 

[8] Let V; = {v1,..., Un}, Vo = {w1,..., Wn} be disjoint collections of 
vertices, let V := Vi; U Vo, and let G = G(V), V2, E) be the bipartite 
graph formed from all those edges {v;, wj} for which i < j. Use this to 
show that even for very simple graphs one must require an exceptional 
set of pairs (V;, V;) which is not regular. 

By modifying the proof of Roth’s theorem, use Proposition 10.45 to prove 
Proposition 10.47. 

[323] Use Lemma 10.46 to prove Proposition 10.47, without going 
through Proposition 10.45. (Hint: consider a graph whose vertices are 
the vertical lines {(a, b) : a = const}, horizontal lines {(a, b) : b = const} 
and diagonal lines {(a, b) : a + b = const} in Z*, and with two vertices 
connected by an edge if their associated lines have distinct orientations 
and intersect in a point in A.) 

Show that Proposition 10.47 implies Roth’s theorem. (Hint: if A C Z, 
consider sets of the form {(a, b) € Z x Z :a+2b € A}.) 

[136] Let V1, V2 be disjoint finite sets, and let fı : Vi > {—1, +1} and 
f2: V2 > {-1, +1} be functions. Let G = G(V,, V2, E) be the bipartite 
graph formed by creating an edge between x; € V; and x2 € V2 if and 
only if fı(x1) = fo(%2). Let Xı C Vj and X2 C V2 be non-empty, and let 
0 < £ < 1. Show that if (X1, X2) is e-regular, then 


[Ex ex, fi), [Exex, f20%2)| 2 1 — OCe). 


This shows that any partition of V; and V, into regular pairs will have 
to essentially be a refinement of the sets {x, € V; : fı(xı) = £1} and 
{x2 € Vo: fo(x2) = £2}. 

[136] Let V be a large finite set. Show that there exist n functions 
fis---> fa: V > {-1, +1} for some n = Q(log |V|) with the property 








10.7 Szemerédi’s argument 411 


that for any distinct x, x’ € V, we have f;(x) = f(x’) for at most 
3n/4 values of i, or in other words |Ejeri m fix) fi@’)| < 1/2. (Hint: 
use the probabilistic method. Alternatively, identify V with an error- 
correcting code in {—1, +1}”, constructed for instance using the greedy 
algorithm.) If A: V > R* is any function such that ||A||1(y) = 1 and 
|A lliocv) < 1 — £ for some £ > 0, show that 


2 


DAWE] <1- Ke). 


xeV 


Eje{1,n] 








Conclude in particular that | 5° 
Q(en) values of i. 

10.6.11 [136] Let V be a large finite set, and let fi, ..., fa: V > {—1, +1} 
be as in the preceding exercise. Let W be another large finite set, let 
G be the graph with vertex set [1,n] x V x W, with any two distinct 
vertices (i, x, w), (J, y, z) being connected by an edge if and only if 
Ji) = fj (x). Let € > 0, and suppose that [1, n] x V is partitioned into 
[l,n]x Vx W=YV,U---U Vç as in the regularity lemma. Suppose 
further that for all but O(ek) of the sets V,, there exists an i, € [1, n] 
such that |V; N ({is} x V x W)| > (1 — O(e))|V;|; thus up to errors of 
O(e), most of the cells V, of the partition are essentially contained 
in one of the {i} x V. Conclude that for all but O(ek) of the sets V,, 
there exists i, € [1, n] and x, € V such that |V; A ({is} x {xs} x W)| > 
(1 — O(e))|Vs|; thus any regular partition which essentially refines the 
partition {{i} x V x W}, must automatically essentially refine the finer 
partition {{i} x {x} x W}. (This is a more complicated version of Exer- 
cise 10.6.9, and requires use of the previous exercise, with A(x) being 
equal to the relative density of V; N ({is} x {xs} x W) in V; N ({is} x 
V x W).) An iteration of this fact can be used to establish a lower bound 
of tower type for the Szemerédi regularity lemma; see [136]. 


A(x) fi(x)| < 1 — Qe) for at least 


xeV 


10.7 Szemerédi’s argument 


In this section we give another proof of Roth’s theorem due to Szemerédi (see 
e.g. [143]). This argument gives slightly better bounds than that obtained from 
the regularity lemma, but still worse than that given from the Fourier-analytic 
argument. However, it has the advantage of being completely elementary and 
rather short. A more complex version of this argument was also used in [343] to 
establish Szemerédi’s theorem for progressions of length 4, but the general k case 


412 10 Szemerédi’s theorem for k = 3 


requires a rather different (and even more complex) combinatorial argument which 
we will not discuss here; see [345]. 

Intuitively, the idea is as follows. If A is a dense set of an interval [1, N], then it 
should contain a large cube a + [0, 1]¢ - v. If A also has no proper progressions of 
length three, then this implies that A must be disjoint from a sumset 2a + [0, 1] - 
2v — Apo of a large set and a cube. This disjointness “squeezes” A into a collection 
of moderately long progressions, and the density of A must increase on at least 
one of them. This creates a density increment that one can then iterate as in the 
Fourier-analytic proof of Roth’s theorem. Thus one is using the disjointness from 
a sumset as a substitute for Fourier bias (cf. Exercise 4.3.12). 

We now give the main steps of the argument, leaving the proofs as exercises. 
First we need to show that dense sets contain cubes. 


Lemma 10.49 Let 0 < 6 <1, let P be a large proper arithmetic progression, 
and let A be a subset of P with |A| > ô|P|. Then A contains a proper cube 
a + [0, 1]? - v with d = Qs(log log | P|). In particular all the steps vı, ..., va are 
non-zero. 


The main point here is that the quantity d goes to infinity (somewhat slowly) 
as |P| — oo. As a corollary of this lemma, we have 


Corollary 10.50 Let 0 < ô < 1, let N be a sufficiently large integer depending 
on ô, and let A be a subset of [1, N] with |A| > ôN which contains no proper 
progressions of length three. Then at least one of the following statements is true: 


e (Density increment) There exists a progression P C [1, N] of length 
|P| > N/4+ O(1) such that |A N P| > 1.18| P]. 

e (Disjointness from sumset) There exists a set Ag C [1, N/4] and a cube 
a+ [0,12 -vc (N/4, N/2] with d = Qs(log log N) and all steps v1, ..., Va 
non-zero such that |Ao| = QEN) the sumset 2a + [0, 1]! -2v — Ao C [1, N] 
is disjoint from A. 


Proof Without loss of generality we may assume that |A N (iN/4, (i + 
1)N/4]| < 1.16N/4+ O(1) for all i = 0, 1,2,3 otherwise we have a density 
increment. In particular this implies that |A N (N /4, (i + DN/4]| = Q(6N) 
for all i. Applying Lemma 10.49 to the set A N (N/4, N/2] we see that this 
set contains a cube a + [0, 1]“- v with the desired properties. If we then set 
Ao := AN [1, N/4), the claim then follows by observing that whenever x € Ao 
and y € a + [0, 1]@- v, the sequence x, y, 2y — x is a proper arithmetic progres- 
sion and hence 2y — x cannot lie in A. 














10.7 Szemerédi’s argument 413 


Suppose we are in the situation in the above corollary, and the “disjointness 
from sumset” statement holds. Let Ey C E, C --- C Ea C [1, N] be the sets 


E; := 2a + [0, 1]! - (2v1, ..., 2v;) — Ao. 


By the pigeonhole principle we can find 1 <i < d such that 


N 
|E;| < |Ei-1|+ 0 (3). 


On the other hand, we have E; = E;_; + {0, 2v;}. This shows that we can partition 
the set [1, N]\ Æ; into o(4) proper arithmetic progressions P;,..., P of step 2v; 
(see Exercises). Observe that 


k 
D [P;| =N —|Ei| < N — |Ao| =  — Q6))N. 
j=l 


On the other hand, since A is disjoint from E;, we have 
k 
YS IAN Pil = |A| = ôN. 
j=l 


We thus have 


k 

SIAN Pj= (6 +.c8°)|Pj| — ed > 0 

j=l 
for some small absolute constant c > 0. Thus by the pigeonhole principle, there 
exists a progression P; such that |P;| > cd = Q;(loglog N) and |AN P;| = 
(6 + cd7)| P;|. This establishes a density increment of A on a progression whose 
length goes to infinity as N — oo. This is essentially Corollary 10.26 (but with 
somewhat worse explicit constants) and one can now iterate this Corollary as before 
to establish Roth’s theorem. 

A careful accounting of the bounds here yields the bound r3([1, N]) = 

O(N/ log, N), which is marginally better than the bounds obtained by the regu- 
larity lemma. 


Exercises 


10.7.1 Prove Lemma 10.49. (Hint: first prove the following preliminary state- 
ment: if |A| > 6|P|, and |P| is large depending on ô, then there are 
Q(5?| P|) values of v such that |A N (A + v)| = Q(67|P]).) 

10.7.2 LetA C [1, N]andr ¥ Obesuchthat A +r C [1, N]and|A + {0,7}| < 
|A| + k. Show that [1, N]\(A + {0, r}) can be partitioned into O(k) dis- 
joint arithmetic progressions of step r. 


11 





Szemerédi’s theorem for k > 3 


In this chapter we continue the study of Szemerédi’s theorem (Theorem 10.1), 
but now focus on the case of longer arithmetic progressions k > 3. While we 
have seen that the k = 3 case of this theorem can be treated by Fourier-analytic 
methods, it turns out that the higher-k case cannot be dealt with by (linear) Fourier- 
analytic tools, even when k = 4; we will see some justifications for this fact later. 
Indeed, whereas Roth’s treatment [287] of the k = 3 case appeared in 1953, it was 
only in 1969 that Szemerédi [343] established the k = 4 case of Theorem 10.1, 
by combining the density increment argument of Roth with some impressively 
complicated combinatorial arguments (based on those discussed in Section 10.7). 
Unfortunately, this argument did not yield any new bound on van der Waerden’s 
theorem (Exercise 6.3.7), as this theorem was used in the proof; note that one 
of the original motivation of Erdős and Turán in introducing this problem in 
[99] was to obtain a more effective bound on van der Waerden’s theorem than 
the Ackermann-type bounds obtained by the usual proof methods. In 1972 Roth 
[288] obtained an alternative proof of the k = 4 case by combining the Fourier 
method with Szemerédi’s arguments, but again van der Waerden’s theorem was 
involved. 

In 1975, Szemerédi [345] finally established Theorem 10.1 for all k. The argu- 
ment is purely combinatorial. It uses the density increment argument, van der 
Waerden’s theorem, and an induction on k, although to execute this induction 
properly, a number of new combinatorial tools needed to be introduced, most 
notably the very useful and influential Szemerédi regularity lemma, which has 
already been discussed in Section 10.6. This lemma in particular introduces an 
energy increment argument to complement the density increment argument. This 
proof was elementary, and a technical masterpiece, but was rather complicated; 
while it did give (in principle) explicit quantitative bounds for Theorem 10.1, they 
were extremely large (involving iterated Ackermann functions). For reasons of 
space we will not present the original argument here. 


414 


11 Szemerédi’s theorem for k > 3 415 


Since Szemerédi’s proof, several other important and very different proofs of 
this theorem have appeared (though as we will discuss later, all proofs share some- 
thing in common, namely an analysis of the dichotomy between randomness and 
structure). In 1977 Furstenberg [121], [125], [122] introduced an elegant ergodic 
approach which proved Szemerédi’s theorem for arbitrary k. The first observation 
was that there was a correspondence principle which showed that Theorem 10.1 
was equivalent to arecurrence in the ergodic theory of measure-preserving systems, 
now known as the Furstenberg multiple recurrence theorem. This theorem ink = 2 
was the classical Poincaré recurrence theorem, while Furstenberg observed (see 
[122]) that the k = 3 case, i.e. Roth’s theorem, could be deduced from some spec- 
tral theory of the shift operator and some measure-theoretic constructions. Again, 
in the k > 4 cases, spectral methods (which are the ergodic analog of Fourier- 
analytic methods) proved insufficient to deduce the recurrence theorem; however, 
by developing a structural theorem for measure-preserving systems (which can be 
viewed as an infinitary analog of the Szemerédi regularity lemma, and is proven by 
an infinitary version of the energy increment argument, requiring Zorn’s lemma), 
and by establishing a recurrence theorem for generalized almost periodic func- 
tions (this being a measure-theoretic analog of the van der Waerden theorem, and 
which in fact could be deduced from that theorem and some additional arguments), 
Furstenberg was able to obtain a conceptually simple, but non-elementary, proof of 
Szemerédi’s theorem for all k. This proof has since proven to be extremely flexible 
and has yielded many other recurrence theorems of a similar type, the majority 
of which have not yet been obtainable by other means. On the other hand, due to 
the infinitary nature of the argument, and the reliance of the axiom of choice, no 
quantitative bound could be extracted from these arguments until very recently (see 
[357]). We shall briefly discuss this infinitary ergodic approach in Section 11.5, 
though we will not go into details as they require a certain amount of machinery 
not used elsewhere in this book. We will discuss the finitary version, which has 
less machinery but is more complicated, in Section 11.4. 

The next important observation, which first appeared in [304] in 1978, was 
that the Szemerédi regularity lemma could handle the k = 3 case (i.e. Roth’s 
theorem) directly, without need for van der Waerden’s theorem, see Section 10.6; 
indeed one could also obtain a generalization of this result to right-angled triangles 
which was previously obtainable by ergodic means [124]. However, it became 
increasingly apparent that in order to generalize this argument to higher k one 
would have to generalize Szemerédi’s regularity lemma to hypergraphs. There 
were a number of attempts in this direction, most notably by Frankl and Rödl 
[109], who managed to handle 3-uniform hypergraphs (which implies Szemerédi 
theorem for arithemtic progression of length 4). However, in the general case, even 
stating a valid formulation of a hypergraph regularity lemma which would then 


416 11 Szemerédi’s theorem for k > 3 


easily imply the full strength of Szemerédi’s theorem remained elusive until very 
recently, when Gowers [140] and Rödl and Skokan [283], [284] finally derived a 
sufficiently strong hypergraph regularity lemma from which Theorem 10.1 easily 
follows. We will discuss this approach in Section 11.6. 

It is also worth mentioning that in 1988, Shelah [318] discovered the first 
quantitative bound on van der Waerden numbers which was not of Ackermann 
type (i.e. it was primitive recursive), although it was still extremely large, for 
instance larger than a tower exponential bound. As with the other direct proofs 
known for van der Waerden’s theorem, this result, while ingeniously simple, did 
not seem to offer any new proof of Szemerédi’s theorem, although it would improve 
the bound arising from Szemerédi’s argument somewhat (given that that argument 
requires the van der Waerden theorem). 

Another major breakthrough then occured with the work of Gowers [137], 
[138] in 1998 and 2001, who combined the Fourier-analytic methods of Roth 
with methods of additive combinatorics (such as Freiman’s theorem, and the 
Balog—Szemerédi-Gowers theorem), and several new ideas (such as introducing 
the Gowers uniformity norms, and initiating a theory of quadratic Fourier analysis 
as well as higher-order generalizations) to obtain a new proof of Szemerédi’s the- 
orem first in k = 4 and then for general k. This proof gave for the first time bounds 
in Szemerédi’s theorem which were comparable in strength to Roth’s original 
argument (though the analog of Bourgain’s bound for k = 3 has not yet been 
replicated for higher k). This argument also revealed for the first time the new 
structures (such as quadratic phase functions) which had to be understood in order 
to handle the cases k > 4 and which were not present in the k = 3 theory. Interest- 
ingly, very similar conclusions were also being drawn at the same time from the 
ergodic theory approach, most notably through the recent work of Host and Kra 
[185] and Ziegler [386] but also from earlier work of Furstenberg, Weiss, Conze, 
Lesigne, and others. There are thus encouraging signs of a “higher-order Fourier 
analysis” which may yet unify the Fourier and ergodic approaches, but while 
developments here are proceeding rapidly, there does not yet appear to be a defini- 
tive and satisfactory theory in this regard. We shall discuss Gowers’ approach in 
Sections 11.1-11.3. 

Finally, an extremely recent development has been the coming together of many 
of the independent directions of research mentioned above, most notably in the 
recent result of Green and Tao [158] which establishes arbitrarily long progres- 
sions in the primes, or in any subset of the primes of positive relative density. 
This argument requires that one extend Szemerédi’s theorem to the setting of 
dense subsets of “pseudo-random” sets such as the almost primes. To achieve this, 
one uses a quantitative version of the ergodic theory approach, set in a finitary 
rather than infinitary setting (and with no use of the axiom of choice); to make 


11.1 Gowers uniformity norms 417 


this scheme work, one must borrow heavily from the tools developed from the 
other approaches to Szemerédi’s theorem, notably the energy increment strategy 
used to prove the regularity lemma, the Gowers uniformity norms used in the 
Fourier-analytic approach, and the relativization to pseudo-random sets which 
first appears in the graph and hypergraph regularity approach; in addition, some 
number-theoretic arguments (involving some analysis of the Riemann zeta func- 
tion) is necessary to ensure that the almost primes have the desired amount of 
pseudo-randomness. We will discuss this result in Section 11.7. 

Unfortunately we will not have space to give a full proof of Szemerédi’s theorem 
or the Green—Tao theorem in this chapter, as all of the known proofs are either 
quite lengthy or require a fair amount of supporting theory. We will however 
give several partial results and key lemmas, and describe the main steps of most of 
these proofs, referring the reader to the original references for the complete details. 
Thus this chapter should be viewed as an introduction to the very active current 
area of research surrounding Szemerédi’s theorem, rather than a comprehensive 
treatment of the field. One theme we wish to emphasize here is that despite the 
extraordinary diversity of methods and techniques in all the various proofs of 
Szemerédi’s theorem, that there is some very strong unifying themes among all 
these approaches, such as the exploitation of a dichotomy between randomness 
and structure, and this chapter intends to highlight such common themes between 
all the known proofs of Szemerédi’s theorem and related results. 


11.1 Gowers uniformity norms 


As in the previous chapter, it is convenient to attack Szemerédi’s theorem by 
studying the k-linear form 


AC fo, «++ fk-1) = Ex, rez for) fila +r) fk- + (k — Ir) 


defined for any finite additive group Z and any functions fo,..., fk-1 : Z > C. 
Thus, for instance, Ax(14,..., 14) is at least as large as P(A)/|Z]|, and will be 
larger if and only if A contains an arithmetic progression of length k with non-zero 
step; note that such progressions are proper if |Z| is coprime to k!. This form of 
course generalizes the form A3(f, g, h) which featured prominently in the previous 
chapter. 

Just as Varnavides’ theorem (Theorem 10.9) is equivalent to Roth’s theorem 
(Theorem 10.8), Szemerédi’s theorem is equivalent to the following. 


Theorem 11.1 (Szemerédi’s theorem again) Let k > 3, let Z be a finite cyclic 
group of prime order |Z| > k, and let f : Z — R* be a non-negative function 


418 11 Szemerédi’s theorem for k > 3 


which is not identically zero, and obeys the bound 0 < f(x) < 1 for all x € Z; 
then 


Ak(f, -3 P) = Qh Ep). 


In fact, this theorem is valid for all finite abelian groups, not just the cyclic 
group, as we shall see in Section 11.6. 

Thus one strategy to prove Szemerédi’s theorem is to obtain good bounds for 
quantities of the form A;( fo, ..., fg—1) for various choices of fo, ..., fx—1). This 
is the approach taken by both Gowers’ Fourier-analytic proof and in the finitary 
ergodic proof (and variants of this strategy also are used in the infinitary ergodic 
proof and the hypergraph proof). In the previous chapter, the linear bias norm 
ll f\lu2cz) Was used to control this quantity effectively when k = 3, but this norm 
turns out to not be appropriate for higher k (see exercises). There are higher-order 
generalizations || f ||,,«-1(z) of the linear bias norm which we will discuss later, but in 
the absence of any useful quadratic or higher-order analog of Plancherel’s theorem, 
it is difficult (though not impossible, see below) to use this norm to control Ax. 
Instead, there is a related norm, the Gowers uniformity norm || f || y*-1(z), which is 
more combinatorial than Fourier-analytic in nature, but controls the form A, very 
easily. It is defined as follows. 


Definition 11.2 (Gowers uniformity norm) Let f : Z — C and d > 1. Then 
the Gowers uniformity norm || f ||v«(z) of order d is defined recursively by! 
= 1/24+1 
ll) = EAN; Wf lloezy = (Erez T f Flia)” 
for all d > 1, where T” f(x) := f(x +h) is the shift of f by h. 
Thus for instance we have 
I fllvaz = ŒnrezlEz (T f PP) 
= (Ex m mez fO + hi + ho) fe FADS] + ha) f(xy)" 


and more generally 





1/24 
If lluacz) = (Es _ haeZ I] cll Fx tom) (11.1) 


we{0, 1}4 
where C f := fis the conjugation operator, w = (%1, .. ., @d), h := (hı, ..., ha), 
and |@| := wı +---+ wa. In the case of an indicator function f = 14, we have 
Ilalla = Pxm, nez + [0, 1]? - (a, ..., ha) C A); (11.2) 


1 It would also be consistent to define || f || uoz) = Ez(f), but this quantity is signed and thus is too 
pathological to be called a norm. 


11.1 Gowers uniformity norms 419 


thus ||14||ye;z) is a normalized measure of how many d-dimensional cubes are 
contained in A. In particular, we have the identity 


llallzz = E(A, A)/IZP 


which relates the U? norm of 1, to the additive energy of A, defined in Defini- 
tion 2.8. 

At first glance, the Gowers uniformity norm || f||y«-1;z) norm looks even more 
complicated than the expression A;(/,..., f) which it is meant to control, but as 
we shall see it has a significantly better structure which makes it more amenable 
to analysis. 

In the d = 2 case the Gowers uniformity norm is also related to the Fourier 
transform by the simple formula 


Iflg = Illes (11.3) 
we leave the verification of this identity as an exercise. (Compare also with (4.18).) 
In particular this shows that the U 2(Z) norm is indeed a norm. It turns out that the 
higher U 4(Z) norms are also norms as well. To see this it is convenient to introduce 
the Gowers inner product (( fy )weto,1)2) uaz) OF 24 functions fo by the formula 


((fowet0,1!4)yacz) = Ey ny,...ngeZ I] Cll f(x +w- h). 
we{0,1}4 


Thus for instance 
(Po, fidduz) = Exjnez fie +h) fox) 
= Ez(fo)Ez(fi) 
((foo, for, fio, fit))u20z) = Exh mez fir + hi +h) fio + hi) 
forix + h2) foo(x) 


= X fu) fil) aE) foo). 


EEZ 





Furthermore we see that 


1/24 
I flue) = (Diao. doe: (11.4) 


An application of the Cauchy—Schwarz inequality in the hy variable gives the 
bound 


1/2 1/2 
Foveon yaz] < (Fo docto nyi oDe (11.5) 
where w := (w1, ..., @d—1) € {0, et is the first d — 1 components of œw. 


Similarly for permutations. Applying this inequality d times one obtains 


d 
Foveon uaz] < I] E n « 


õe{0,1}7 


420 11 Szemerédi’s theorem for k > 3 


Applying (11.4), we conclude the Gowers—Cauchy—Schwarz inequality 
(Fo wet0,14)yacz5| < I] Il fell vaczy- (11.6) 
we {0, 1}4 
Applying (11.6) to the special case when fẹ = f for my =0 and f, = 1 
otherwise, one easily verifies the useful monotonicity formula 


lf llu¢zy < WT llu (11.7) 
for alld > 1. Next, from (11.4), multilinearity, and the Gowers—Cauchy—Schwarz 
inequality we have 


Ifo + filla = (fo + fidwet0.14)ya¢zy 
> ((fiwen wet0,1¢)yaczy 


TC (0, 14 


> I] || frwen| UZ) 


IC{0,1}4 we{0,1}4 


TI (llv + Ifilo) 


we{0,1}4 


IA 


from which we deduce the Gowers triangle inequality 


Il fo + filv < Wfolluaz + I filluaz- 


This argument should be compared with the standard derivation of the Hilbert 
space triangle inequality from the Hilbert space Cauchy—Schwarz inequality. 
Since the Gowers norm || f||y«(z) is clearly non-negative and homogeneous, 
it is at least a semi-norm. When d = | it is not necessarily a norm (because 
lf \lu+¢z) = |Ez(f)| can vanish without f being identically zero), but from (11.3) 
and the injectivity of the Fourier transform we see that the U 2 norm, at least, is 
a norm, and then (11.7) implies that the higher U? are also norms. 

We now relate the Gowers uniformity norms to the forms A, which are relevant 
to Szemerédi’s theorem. It is convenient to introduce the following notation: we 
use b(x1, ..., Xn) to denote any function of n variables x;,..., x, that is bounded 
in magnitude by 1. As with the O( notation, the exact function used in the b() 
notation will vary from case to case. This notation is useful whenever dealing 
with complicated multilinear expressions involving an interesting function f, and 
several other less interesting functions whose only important features are their 
boundedness and the precise set of variables that they depend on; the b() notation 
can then be used to conceal the uninteresting functions and focus attention on the 
important terms in the expression. 

We begin with a simple but very useful lemma, which controls the correlations 
of several functions fa with an arbitrary bounded function b(a), in terms of the 
correlations of fa with (shifts of) itself. 


11.1 Gowers uniformity norms 421 


Lemma 11.3 (Van der Corput lemma) Let Z be a finite additive group, and let 
A be a non-empty set. For each a € A let fa: Z — C be a function. Then we 
have 


|Ez(Eacab(@) fa)| < [EacahezEz(T" fa fa)| 
Proof From the triangle inequality followed by Cauchy—Schwarz we have 
|Ez(Eaeab(a) fa)| < EaealEz(fa)| 


< (EgealEz(fa)l?) 
= (Each x vez fale Jaa. 


The claim then follows by making the substitution x’ = x + A. 














As a consequence we have 


Lemma 11.4 (Generalized von Neumann theorem) Let Z be a finite additive 
group, let k > 2, and let co,..., Cx_, be distinct integers such that c; — cj is 
coprime to |Z] for all distinct i, j. Then for any function f : Z — C we have 


[Ex rez f(x + cor)b(x + cir) -bx + ce_1r)| < If luca. 


As a particular corollary, we see that if |Z| is coprime to (k — 1)! then we have 


JAC fo, -> fk- < EOR IF llu- (11.8) 
whenever fo, ..., fk-1 : Z —> Care bounded in magnitude by 1. This result should 


be compared with Proposition 10.11. 


Proof We induce on k. When k = 2 we observe that the map (x, r) > (x + 
Cor, x + cır) is bijective on Z x Z (since |Z| is coprime to co — c1) and hence 


[Ex rez fŒ + cor)b@ + cir)| = |Ez(fEz(b)| < EZA) = If llu 


as desired. Now suppose that k > 3 and the claim has already been proven for 
k — 1. By shifting x by cg—ır if necessary (and replacing c; by c; — ck—1) we may 
take c,_; = 0, so we can write the left-hand side as 


[ExezbQv)E-ez f(x + cor )b(x + cir)» -bŒ + cr-2r)|. 
Applying Lemma 11.3, we can bound this by 
[Ex nezErez(T®" f f(x + cor bx + cir, h) +- bæ + crar, byl”. 
Applying the induction hypothesis, we can bound this by 


T 1/2 
(Erez T” f Flug |. 


422 11 Szemerédi’s theorem for k > 3 


Since co = Co — Cx-1 is coprime to |Z| we can change variables and replace coh 
by h. Applying Hélder we can the bound the previous expression by 


AED 1/2k-1 
(Bez Ff liga) 


and the claim now follows from the recusive definition of the U‘~!(Z) norm. 














Let us informally refer to a function f as Gowers uniform of order k — 2 if the 
quantity || f||y*-1(z) is small. It is easy to verify the bounds 


1/2 
I Flluaczy < WF luz < IFR (11.9) 


whenever f is bounded in magnitude by 1 (see exercises), thus Gowers uniformity 
of order | is the same as linear (or Fourier) uniformity. In analogy with this, 
we shall refer to Gowers uniformity of order 2 as quadratic uniformity, Gowers 
uniformity of order 3 as cubic uniformity, and so forth. A partial explanation for 
this terminology can be found in Exercise 11.1.12; see also the next section. 

The estimate (11.8) shows that functions which are Gowers uniform of order 
k — 2 are negligible for the purposes of counting progressions of length k. One is 
then naturally led to the strategy of approximating an arbitrary function f by amuch 
more structured function f, up to errors which are Gowers uniform. For instance, 
if one is lucky enough that f — E(f) is Gowers uniform of order k — 2, then one 
can use (11.8) to approximate A;(f,..., f) by A, (E(/), ..., E(f)) = E(f)*. Of 
course, it is not always the case that f — E(f) is Gowers uniform. In such an event, 
it is important to understand which functions are not Gowers uniform, and more 
precisely what the obstructions to Gowers uniformity are. This will be the focus 
of the next section. 


Exercises 


11.1.1 Show that Theorem 11.1 is equivalent to Theorem 10.1. Also show that 
to prove Theorem 11.1 it suffices to do so in the special case when f is 
an indicator function, f = 14. 
11.1.2 Let Z = Zy for some prime N > 3, let € be a non-zero element of Z, 
and define the functions 
fox) := e(Ex"/N); 
filx) = e(—3£x° /N); 
frlx) := e(3&x?/N); 
fax) = e(—Ex°/N). 
Show that A4( fo, fi, fo, f3) = 1, but that || fjll,2z) = N! for j = 
0, 1, 2, 3. This shows that there is no direct analog of Proposition 10.11. 


11.1.3 


11.1.4 
11.1.5 
11.1.6 
11.1.7 
11.1.8 


11.1.9 


11.1.10 


11.1.11 


11.1.12 


11.1.13 


11.1 Gowers uniformity norms 423 


Modify this example to show that there is no direct analog of Proposi- 
tion 10.10 either. (Hint: it is simpler to construct an example in a vector 
space such as F? , based on a quadratic hypersurface, than in a cyclic group 
such as Zy, which would require some sort of “quadratic Bohr set”.) 
Modify the proof of the Gowers triangle inequality to provide a proof of 
the triangle inequality for I“ (Z) for d = 1, 2,3,... based purely on the 
Cauchy—Schwarz inequality. 

Prove (11.1) and (11.2). 

Prove (11.3). Use (11.3) and Plancherel’s theorem to prove (11.9). 
Prove (11.5). 

Prove (11.7). 

For any finite additive group Z, any f : Z —> C, and any d > 1, show 
that [[F llvz = Uf lui and [Rl MA < Wf loa): 
Let @ : Z > Z’ be a Freiman isomorphism of order 2 from Z to Z’. 
Show that || f o ¢lluvaz) = || fll vaczy for any d > 1 andany f : Z’ > C. 
In particular we have the translation invariance ||T* f || uaz) = If llvezy 
for any h E€ Z. 

If f:Z— C and f':Z'— C are functions on two finite additive 
groups Z, Z’, show that || f ® f’llu«zezy = Il flluaall f luacz. 

Use (11.8) to give another proof that the || f||y«cz) norms are non- 
degenerate for k > 2, at least in the case when |Z| is coprime to (k — 1)!. 
Let d>1, let F =F, be a field of prime order p >d, and let 
P : F — F be a polynomial of degree exactly d with coefficients in 
F. Let f : F > C be the function f(x) := e(P(x)/p), where the map 
x ++ x/p is defined from F to R/Z in the obvious manner. Show that 
lfliurg) = 1 for all d’ > d, but that || flyr) < (d — 1)/p)"” for 
all 1 < d’ < d; this shows that the U? (F) norms are genuinely different 
for each 1 < d < p. Informally, we see that f is Gowers uniform of 
order d — 1 or less, but is not Gowers uniform of order d or more. In 
particular establish the Weyl exponential sum estimate 


F APOUND = O(p'**). 
xeF 


Compare this with Lemma 4.14. 
For any finite additive group Z and any f : Z —> C, show that 


If lluaz < II F Izaag 


for all d > 1, and that limgoo || f luacz) = || fllz~cz). Show that the 
exponent 2¢/(d + 1) cannot be replaced by any smaller quantity. (Hint: 
consider a Dirac mass, or the characteristic function of a subgroup.) 


424 11 Szemerédi’s theorem for k > 3 


11.1.14 Let G be a subgroup of a finite additive group Z, and let f : Z > C 
and d > 1. For each coset y + G of G, define || f||y«()4q in the obvious 
translation-invariant manner. Show that 

d 
lflvz < (Eyez bf gre yor? 

thus generalizing the previous exercise and demonstrating that “local 
uniformity norms” control “global uniformity norms”. 

11.1.15 Let f : Z — C be a function on a finite additive group Z. Establish the 
Parseval-type identity || f llus = Z|? fF lus, which shows that the 
Fourier transform does not simplify the U? norm. (This phenomenon is 
related to the fact that the Fourier transform of a Gaussian is again a Gaus- 
sian.) Deduce a similar Plancherel-type identity for the U? inner product. 
For the higher U 4 norms, d > 4, the situation is even worse; the Fourier 
representation is more complicated than the spatial representation. 

11.1.16 Let A be a subset of a finite additive group Z, and let d > 1. Using 
(11.7), show that there are at least P7(A)”” |z! d+ 1-tuples 
(x,hy,...,ha) € Z4*! such that the cube x + [0, 1]: (hi, ..., hg) is 
contained in A. Use this to give another proof of Lemma 10.49. 

11.1.17 Let f : Z— C be a random function such that the random variables 
f(x) for x € Z are jointly independent, have mean zero, and are bounded 
by 1. Show that EJ FIZ = Oqg(1/|Z|) for all d > 1. Thus random 


U4(Z) 
balanced functions tend to be Gowers uniform of very high order. 


11.2 Hard obstructions to uniformity 


In this section we consider the following inverse problem: suppose f : Z > Cisa 
function bounded in magnitude by one which fails to be Gowers uniform of some 
order k — 2, say || f || y*-2(z) = 6 for some 0 < 6 < 1. What structural information 
can one then conclude about f? As it turns out, a sufficiently strong answer to this 
question will lead to a proof of Szemerédi’s theorem for progressions of length k. 
This is the strategy employed by Gowers [137], [138] in his proof of Szemerédi’s 
theorem, with the focus on obtaining as strong an inverse theorem for the U‘~?(Z) 
norm as possible. In Section 11.4 we describe a slightly different approach in which 
one obtains a much weaker (and easier to prove) inverse theorem, but one which 
is still sufficient to obtain Szemerédi’s theorem (but with much worse quantitative 
bounds). 

A good model case is provided by the case k = 3. From (11.9) we see that if 
lf \lu2cz) = ô, then || fllu = 6”, and hence there exists a linear phase function 
g(x) := e(€ - x) which has a large inner product with f: |(f, g)r(z| = 6°. This 


11.2 Hard obstructions to uniformity 425 


fact, combined with (11.8), can be used to give a variant of Proposition 10.10 or 
Proposition 10.11, which in turn can be employed in either a density increment 
argument or energy increment argument to prove the k = 3 case of Szemerédi’s 
theorem, as was done in Sections 10.2, 10.3 and Section 10.5 respectively. One can 
view these linear phase functions as being the obstructions to Gowers uniformity 
of order 1; we have just seen that failure of Gowers uniformity of order 1 implies 
correlation with one of these linear phase functions, and conversely the other 
inequality in (11.9) implies that correlation with a linear phase function implies 
lack of Gowers uniformity of order 1. 

This model case, combined with the observations in Exercise 11.1.12, suggest 
that, more generally, lack of Gowers uniformity of order k — 2 should be tied to 
correlation with a phase function which is somehow polynomial of degree k — 2. 
This can be made precise as follows. 


Definition 11.5 (Polynomial bias) Let Z be a finite additive group, and let ¢ : 
Z — R/Zbeaphase function. Given any h € Z, we define the difference operator 
(h - V) applied to ġ as 

h- VA) := p +h) — px). 


We will sometimes subscript V by V, to emphasize the variable being differenced 
over (in case ¢ also depends on some other variables). If d > 1, we say that ¢ is a 
phase polynomial of degree less than d if we have 


(hy -Va (ha: VD) = 0 for all x, hi, ..., ha € Z. 


Phase polynomials of degree less than 2 will be referred to as linear, phase poly- 
nomials of degree less than 3 will be referred to as quadratic, and so forth. If 
f : Z — C is a function, then we define the polynomial bias of f of degree d to 
be the quantity 


If lluacz) = ip IF e(a = sup lExez f œ)e(—px))| 


where ¢ ranges over all phase polynomials of degree less than d. 
More generally, if B C Z is non-empty, we say that ¢ : B — R/Z is a locally 
polynomial phase function of degree less than d if we have 


(hy - Vy)... (ha: V(x) = 0 whenever x + [0, 1] -(hi,..., ha) C B, 
and then define 


IF llui) = sup [Cf e) = sup Exes f (x)e(—$(x))| 


where @ ranges over all phase functions which are locally polynomial on B of 
degree less than d. 


426 11 Szemerédi’s theorem for k > 3 


To illustrate this definition, first observe that the only phase polynomials of 
degree less than 1 are the constants ¢(x) = c, and hence 


Il Flluzy) = IEC = If llv. (11.10) 


Thus || - |],,1(z) is aseminorm. Ford > 1, one easily verifies that || - ||,«z isa genuine 
norm. For instance, from Exercise 4.1.4, we see that the only phase polynomials of 
degree less than 2 are the linear phases (x) = & - x + c, and thus the definition of 
the u? norm matches the one given in (10.5). In particular we still have the relation 
(11.9). 

More generally, the uw4(Z) and U4“(Z) norms are quite related, enjoying the 
same symmetries. For instance, if @ is a phase polynomial of degree less than d, 
then one can easily verify that 


Il fe(—P)lluacz) = Iflg I fe(— olu = If luz (11.11) 


In particular, from (11.7) we have 


[Exvez f (x)e(—¢x))| = Il feco lluz = Il fe(—@)lluaz) = If lluacz) 


and hence on taking suprema we have 


If lluaczy < Wf lluacz- 


Thus correlation with a phase function of degree less than d implies lack of Gowers 
uniformity of order d — 1. In light of (11.10), (11.9), one may hope that a converse 
statement is true, namely that lack of Gowers uniformity of order d — 1 implies a 
correlation with a phase of degree less than d. One hopeful sign in this direction 
is the identity 


leOn = Ey iy,..naeZe((y + Yx) + ha + Vodo) (11.12) 


whose verification we leave as an exercise. This suggests, though does not quite 
prove, that a function has large U? (Z) norm if and only if its phase is approximately 
polynomial of degree less than d. The above statement would then be an assertion 
that a phase which is approximately polynomial of degree d, in fact correlates 
with a genuine polynomial of degree d. Such an assertion should remind one of 
the Balog-Szemerédi—-Gowers theorem, Theorem 2.29, and in fact that theorem 
plays a key role in establishing facts such as these. 

In the case when Z is a vector space over a finite field of small order and k = 4, 
we can formalize these conjectures affirmatively as follows. 


Theorem 11.6 (Inverse theorem for U?(F”)) [137], [160] Let Z be a vector 
space over a finite field F, let f : Z — C have magnitude bounded by 1, such 
that || f \lu3(z) Z n for some O < ņ < 1. Then there exists a subspace W C Z 


11.2 Hard obstructions to uniformity 427 


with 
dimp(W) > dimp(Z) — O(n?) (11.13) 
such that 


Eyez f leg+w) = (n°). (11.14) 
In particular, there exists y € Z such that || f \lu3y+4w) = QP). 


The proof of this inverse theorem is quite lengthy, using techniques from pre- 
vious chapters as well as a heavy reliance on Fourier-analytic methods and the 
van der Corput lemma (Lemma 11.3), and will be deferred to the next section. 
We remark that the case when F has characteristic 2 was not quite dealt with in 
the above-cited papers, but requires an additional observation of Samorodnitsky 
(private communication). Assuming it for now, we can now prove Szemerédi’s 
theorem for vector spaces Z and in the case k = 4. In fact the inverse theorem 
allows us to give both a density increment proof and an energy increment proof. 
The density increment proof is based on the following proposition, analogous to 
(though somewhat weaker than in some respects) Lemma 10.15. 


Proposition 11.7 (Lack of uniformity implies density increment) Let Z be a 
vector space over a finite field F of odd prime order, and let f : Z — C have 
magnitude bounded by 1 be such that Ez(f) = 0 and || f \ly3;z) = n for some 
0 <n < 1. Then there exists a subspace Z' of Z with 


1 
dim; (Z^) > 5dim(Z) -= O(n) 
and a point x9 € Z, such that 


Eyer +2 f(x) = Q(n?). 


Proof From Theorem 11.6 we can find a subspace W obeying the dimension 
bound (11.13) and the correlation bound (11.14). Also note that if Ey.w(f) = 
Q(7?) for even a single y € Z then we will be done, so we may take Ey, w(f) < 
cnf for any given absolute constants c, C > 0. Since we also have 


EyezEy+w(f) = Ez(f) =0 





we conclude that 


E,ez|Eyiw(S)| = 2Eyez max(E,+w(f), 0) < 2en© 


and so from (11.14) we have (choosing the constants c, C appropriately) 


Eyezll fllaq+wy — 2IEy+w(f)| = Q(n?). 


428 11 Szemerédi’s theorem for k > 3 


In particular we can find y € Z such that 


If llao+wy = 2lEy+w (I+ R(n”). 


By translating f by y if necessary we may take y = 0. By definition of the u? norm 
and Exercise 11.2.6, we can thus find a self-adjoint linear operator M : W —> W 
and é € W such that 


[Evew fx)e(—Mx - x)e(—€ - x)| > 2IEw(f)| + R(n). 
Observe that the quantity Mx -x +&-.x only takes at most |F| values. If we 
thus partition W into |F| level sets $1, S2,..., Sir], each of the form {x € W : 
Mx -x + & - x = const}, then we have from the triangle inequality that 

IFI 


X Ewlo 2 


j=l 


[Fl 


YS Ewlo) 


j= 


+ (1%) 








and hence, by the identity max(y, 0) = (|y| + y)/2, 


[F| 


J maxEsewls, œf), 0) > R(n’) 
j=l 


and so by the pigeonhole principle we can find j such that 
Exewls,() f(x) > 2(n°)Pw(S)). 
Now we need to take the quadratic surface S; and partition it into affine spaces. 
We first observe that there exists a subspace U of W with dimension 
1 351 
dimr(U) = zdim; (W) — 5 = 5dimr(W) — O(n?) 


which is null with respect to M: see Exercise 4.3.16. Splitting Sj into cosets of U, 
we see from the pigeonhole principle that there exists a coset x; + U such that 


Esenu l fE) > Q(n°)Py,40(S)), 
so in particular $; N (x; + U) is non-empty and 
Eses) f Œ) > (n? ®). 


The point of working on a coset xı + U of anull space is that the quantity Mx - x + 
& - x becomes linear with respect to x. Thus the intersection of S; with x; + U is 
an affine subspace x9 + Z’ of x; + U of codimension at most 1. The claim follows. 














Iterating this proposition as in the proof of Roth’s theorem, one can eventually 


deduce the bound 
Fis |F|” 
r4(F") = 0 (11.15) 


log’ n 





11.2 Hard obstructions to uniformity 429 


for all n > 1 and some absolute constant c. It is also possible to adapt the energy 
increment argument from Section 10.5, with the the concept of quasi-periodic being 
replaced with that of being determined by a bounded number of quadratic phase 
functions, however the bounds on r4(F”) obtained this way are rather poor. One 
can do a bit better by adapting the argument in Theorem 10.27, obtaining the bound 


lea) 
r4(F") = O > 
n 





Cc 


see [161]. 

It is likely that the above inverse theory extends to higher values of k, but 
there are some technical difficulties in carrying this out, and this has not yet been 
achieved at this time of writing. 

Given the success of the inverse U? approach to establish in the finite field case, 
one then is led to see whether a similar inverse theorem holds for other groups, such 
as cyclic groups Zy. Here one encounters an interesting phenomenon, which is 
that the quadratic phase functions on Zy do not form a complete set of obstructions 
to Gowers uniformity of order 2. An example is given as follows. 


Proposition 11.8 (Furstenberg—Weiss example) Let N be a large integer, and 
let M := |VN], and let œ be an irrational number obeying the diophantine 
condition |\na|lp/z = Q(n-°) for some constant C > 0. Define the function 
f: Zy > Cby f(x) := e(alx/MJ*) when x € [0, M/10) + M - [0, M/10), and 
f(x) := 0 otherwise. Then || f \ly3zy) = OC), but || f lla) = On oo:0(1). 

As the name implies, this example was essentially discovered by Furstenberg 
and Weiss[126], though in a substantially different language to that presented here 
(they constructed a characteristic factor for quadruple recurrence which was not 
given by quadratic eigenfunctions). 


Proof (Sketch) We can write f(x) = e(f(x))1 p(x), where P is the progression 
P := [0, M/10)+ M - [0, M/10) and ¢@ is the phase function $(x) := alx/M]?. 
One can easily verify that ¢ is locally quadratic on ¢, and hence by (11.7) 
Il flluszy) = lle llusay) = Ielu = Pzy(P) = 80). 
On the other hand, since f is bounded by 1, we have || f|ly3z,) < 1. Thus 
Ilf llus) = (1) as claimed. 
To prove the second claim, we see from (11.2.2) that it suffices to show that 
Exerye(b(x) + (c2x° + cix + co)/N)1 p(x) = On-scoa(1) 
for all integers cg, c1, c2. Writing x = yM + z for y, z € [0, M/10), it suffices to 
show that 


E, ze10,m/1ne(ay” + (cx(yM + z} + c1(yM + z) +.€0)/N) = On-sco.a(1). 


430 11 Szemerédi’s theorem for k > 3 


To estimate this sum one has two choices. Either one can apply van der Corput’s 
lemma (Exercise 11.2.9) twice in the y variable (with Hı = M'~* and H) = M 12e 
for some small £), and reduce to showing that 


Enel, M] mell, m]e(2(& + c2M°)hih2) = Ov-sco(1); 


or one can apply van der Corput’s lemma once in the y variable and once in the z 
variable to reduce to showing that 


En ct, m], me, melc M hy hz) = on 00;¢(1). 


While neither of these two bounds holds uniformly in co, it turns out that one 
of the two bounds is always true, the latter in the “minor arc” case when cM 
is not within M~?+?©) to being a rational with denominator at most M?“, and 
the former in the complementary “major arc” case. The exact verification of the 
bounds requires some basic machinery from Diophantine approximation, but we 
omit it as it is somewhat messy. 














This example shows that in addition to the globally quadratic phase obstructions 
that appeared in the finite field case, we now must consider locally quadratic phase 
obstructions, which are only defined on a suitable progression in the group such 
as [0, M/10)+ M - [0, M/10). One can alternatively replace progressions with 
Bohr sets, which are of course closely related (cf. Section 4.4). A typical inverse 
theorem in this setting is as follows. 


Theorem 11.9 (Inverse theorem for U*(Z)) [160] Let Z be an finite additive 
group of odd order, let f : Z — C be a function bounded in magnitude by 1, such 
that || f \lu3(z) = n. Then there exists a regular Bohr set B := B(S, p) in G with 
IS] < O(7-°™) and p = Q(n?°™) such that 


Eyezll f llu3o+8) = R(n? ®). (11.16) 
In particular, there exists y € Z such that || f |w g+B) = QUL V). 


The proof of this theorem is similar to that of Theorem 11.6 which we give 
below, but is somewhat more complicated as we must work with (regular) Bohr 
sets instead of subspaces (which ultimately arises from the application of a version 
of Chang’s theorem, Theorem 4.42, for arbitrary groups). It can then be used to 
prove 


Proposition 11.10 (Lack of uniformity implies density increment) /137J, 
[138] Let Z = Zy be a cyclic group of odd prime order, and let f : Z => C 
have magnitude bounded by 1 be such that Ez(f) = 0 and || f \ly3(z) = n for 
some 0< < 1. If N > exp(O(n °), then there exists a proper arithmetic 


11.2 Hard obstructions to uniformity 431 


progression P in Z of length |P| = Q(N°) for some absolute constant 0 < c < 1 
such that 


Ever f(x) > Q(n). 


This result was first established by Gowers! [137], [138] without directly prov- 
ing an inverse theorem. However, the method of proof of Theorem 11.9 in [160] is 
based almost entirely the techniques used in [137] to establish Proposition 11.10. 
By the usual iteration arguments, this proposition can be used to establish the bound 


Zy) =O = 
nit = 0 (Trion) 


for some absolute constant 0 < c < 1 and all large N; this is the best bound on 
r4(Zy) known to date. See [137], [138], [160] for further discussion. In a similar 
spirit, Theorem 11.9 can eventually be used to establish the more general result 


r(Z) = O (5) 
(log log |Z|)° 


for any large finite additive group Z; see [160]. It seems likely that this bound 
can be improved to OGe ig) by using the arguments in Theorem 10.27 or 
Theorem 10.30 but this will probably be quite messy. 


Exercises 


11.2.1 Prove (11.11). 

11.2.2 Let Zy be acyclic group (and thus also a ring), and let 6 : Zy — R/Z 
be a phase polynomial of degree less than d. Show that there exist 
Co, C1,---,Ca_1 € Zy such that P(x) = (cg_yx?-! +--+ + cix + co)/N 
for all x € Zy, where the map x +> x/N is defined from Zy to R/Z in 
the obvious manner. Conversely, every function of this form is a phase 
polynomial of degree less than d. Thus in the cyclic case, the concept of 
a phase polynomial collapses to the usual definition of a polynomial. 

11.2.3 Prove (11.12). (You may need to reflect some of the variables or take 
conjugates to eliminate a (— 1)? factor.) 

11.2.4 Let f : Z —> C be a function bounded in magnitude by 1, and let d > 1. 
Show that || f llu, Il fllvaczy < 1, and that || f||,«(z) = 1 if and only if 
I fllu = 1. 


1 The original argument in [137] had a exponential dependence on 7 rather than a polynomial one for 
E,<p f(x), leading ultimately to the weaker bound of Olon) for r4(Zy ). This is due to a 
reliance on Freiman’s theorem instead of a Chang—Bogulybov type theorem; the problem being that 
the Freiman theorem employed (essentially Theorem 5.32) suffers an exponential loss in an 
unfavorable location. 


432 


11.2.5 


11.2.6 


11.2.7 


11.2.8 
11.2.9 


11 Szemerédi’s theorem for k > 3 


Show that the w4(Z) norm enjoys the same invariances that the U 4(Z) 
norm did in Exercises 11.1.8, 11.1.9, 11.1.10, as well as an analog of 
(11.7). Show that the more general u4(B) norms also obey a suitable 
analog of Exercises 11.1.9, 11.1.10. 

[160] Let F = F, be a finite field of odd prime order, and let Z be 
a finite-dimensional vector space over F, with the usual bilinear form. 
Show that if @ : Z — R/Zis a quadratic phase function, then we have the 
representation d(x) = Mx -x + & - x + c for some unique c € R/Z,& € 
Z, and a self-adjoint F-linear operator M : Z — Z. Conversely, every 
function of this form is a quadratic phase function. What happens if 
F = F, has order 2? 

[160] (Quadratic Hahn—Banach theorem) Let F and Z be as in the preced- 
ing exercise, and let Z’ be a subspace of Z. Show that any quadratic phase 
function on Z’ can be extended (possibly non-uniquely) to a quadratic 
phase function on Z. Conclude in particular that for any f : Z => C 
we have || f|lu3(z) = If lluacz) = Pz(Z^ supyez Il fllway+z; this can be 
viewed as a kind of converse to Theorem 11.6. 

Use Proposition 11.7 and (11.8) to establish (11.15). 

(Van der Corput lemma) If 1 < H < Mand f : [0, M) > Cis a function 
bounded in magnitude by 1, show that 


Esco, m fO] < OE <n<u|Exeto,m—m f(x + DFN 


Hie 1 
+o (377) +0 (aan) 
(Hint: extend f by zero to the integers Z, and obtain a preliminary upper 


bound of (Ey<n<n|Exeto,m—m fŒ + M) + O(H).) Compare this 
with Lemma 11.3. 





11.3 Proof of Theorem 11.6 


In this section we give a proof of Theorem 11.6. Let us fix F, Z, f, n with the 
above properties. The proof proceeds in several stages. 


11.3.1 Locating a somewhat linear phase derivative 


The first step is to apply the inverse theorem (11.9) for the U?(Z) norm. From the 
recursive definition of the U?(Z) norm we have 


EnezilT" f Flt 2 n? 


11.3 Proof of Theorem 11.6 433 


and hence by (11.9) 
EneziIT" f Fiz > nè. 
If we let H C Z be the set 
{h € H : IT" f flia = 0°/2 
then we have 
EnezilT" f fizz) < n?/2+Pz(H) 
and hence 
P2(H) > 7°/2. (11.17) 

By definition of H, we can thus find a function £ : H —> Z such that 

Esez T" f(x) fx)e(—(h) - x)? = n*/2 (11.18) 


forallh € H. Informally, if we use 6(x) to denote the phase of f (x), this estimate is 
asserting that d(x + h) — d(x) — &(h) - x is in some sense approximately constant 
in x, so that d(x + h) — (x) is approximately linear. The challenge is thus to 
“integrate” this fact and conclude that p is somehow approximately quadratic. To 
do this, the first task shall be to obtain some linearity of &(/) (this reflects the 
fact that we expect the quantity (h - V)¢ to somehow be linear in h). We sum the 
preceding expression over all h € H using (11.17) and conclude 


Enez lyh) Erez T" f(x) f @e(—&(h) - x)? = '9/4. 
Expanding this out as in Lemma 11.3 we conclude 
[Ex nez La (WT E FOTE OT f (x) f(x)eE (A) k) = ny4. 


In order to focus on £, we suppress the explicit mention of the functions f using 
the b() notation. After collecting some terms we obtain 


[Ex n zezb(x + h, kyb(x, K) Lahel (h) - O| > ny4. 
We can eliminate the b(x, k) factor using Lemma 11.3, concluding that 


[Ex nai kezb@ + h, k)b(x +h + hi, k)lu(h)ly(h + hye - VE) - k)| 
> n°?/16. 


Making the substitution y = x + h and collecting some terms this becomes 


IE, nn kezbO, k, h)a ylh + hy )e((hy - VE) -O| = n°? /16. 


434 11 Szemerédi’s theorem for k > 3 


We can eliminate b(y, k, h1) using Lemma 11.3, concluding that 


[Ey nhm, kezlu(h)ly(h + hy) lah + h)ly(h + hy + ho) 
e((h2 - V\(hy - V)E(h)- k)| = 4/256. 


The point of eliminating all the b() factors now becomes clear, as the y averaging 
can be dropped, and we can sum the k sum using Lemma 4.5, to obtain 


[En n mezlu(h)ly(h + hi)ly(h + h)ly(h + hy + hz) 
KCh - V)(h - V)E(h) = 0)| > n% /256 


or in other words 


Prnimez(h, h+ hi, h+ h, h + hi +h € A; 
E(h + hi + hr) — (h + hy) — E(h + h2) + E(h) = 0) > n“ /256. 


This is as assertion that € behaves approximately like a Freiman homomorphism 
of order 2; observe that the unknown function f and the phase oscillations have 
completely disappeared from this estimate. This allows us to now employ the tools 
of additive combinatorics. 


11.3.2 Obtaining a perfectly linear phase derivative 


We now convert the somewhat linear phase function (A) into a genuinely linear 
phase function. This shall be done using the inverse sum set technology of previous 
chapters, though one needs to be a little careful to make sure that the density bounds 
one obtains are polynomial in the n parameter rather than exponential. 

Let I C Z x Z denote the set 


T := {(h,&(h)): h € A}; 


then the above statement can be rephrased as a lower bound on the additive energy 
of T (see Definition 2.8): 


ECT, T) > n™|Z|3/256. 


On the other hand, we have |I"| < |H| < |Z|. We can thus apply the Balog- 
Szemerédi—Gowers theorem, Theorem 2.31, to conclude that there exists a 
O(n °“)-approximate group G C Z x Z of cardinality O(n~??|Z]), such that 


IT A (G + (ho, &0))| = 2(n? |Z}) 


for some (ho, £o) € Z x Z. In particular |G| = Q(7??|Z]). We can analyze G 
further using a Freiman-type or Chang-type theorem. There are many ways to do 


11.3 Proof of Theorem 11.6 435 


this; we shall use Corollary 5.29. This shows that 2G — 2G contains a subspace 
V of Z x Z of size 


[V] = QF OF GI) = QF |Z), 
or in other words 
dimp(V) > dimp(Z) — O (ņn7°®). 
Since G is a O(n ?")-approximate group, we see that 
IG + V| <|G+ 2G — 2G| < O(n °)|G| = O(n) |Z | 
and thus 
IP A (G + V + (ho, &0))| 2 IP N (G + (ho, €0))| 
= 2(7°|Z)) 
= 2(n° |G + V|). 


Splitting G + V into |G + V|/|V| cosets of V (this is a very special case of the 
Ruzsa covering lemma, Lemma 2.14) and using the pigeonhole principle, we can 
thus find a coset V + (h1, &,) of V such that 


IDA + (hy, +) = Q(n° |V I). (11.19) 


Thus we have replaced the approximate group G with a genuine subspace V, 
though V is somewhat smaller than G. 

Let Vo := V N (0 x {Z}) denote the vertical component of V. Since I is a 
(partial) graph, we see that all the sums in I’ + Vo are distinct. In particular we 
have 


[Vol A (V + (hi, &)) = IF A (V + (h1, €1)) + Vol < |V + hi, &1)] = IV], 


which, when combined with (11.19), gives the bound |Vo| = O(n~?“). Now from 
elementary linear algebra we can write V = Vo + Vi, where Vj = {(h, Mh): he 
W;} is the graph of a linear transformation M : W; — Z, and W; is a subspace of 
Z of dimension 


dim;(W1) = dim; (V1) = dimp(V) — dim; (Vo) > dimp (Z) — O(n °). 


Covering V by translates of V; and applying the pigeonhole principle to (11.19), 
we can find a coset V; + (h2, &) of V; such that 


ITA (Vi + (h2, &))| = (n° |V9)). 
Unfolding the definition of I, we conclude that 


Prew,(h + h2 € H;£(h + h2) = Mh + &) = Q(?). 


436 11 Szemerédi’s theorem for k > 3 


Thus we have established that € exhibits exact linear behavior on a large fraction 
of a coset h2 + Wj. Recalling the definition (11.18) of £(A), we thus have 


Phew, (Erez T f(x) f(@)e(—(Mh + &) - x)? = 8/2) = Q(n?™). (11.20) 


Ignoring the lower-order terms hz and &2, and writing (x) for the phase of f(x), 
this estimate is informally asserting that (h - V,.)o(x) ~ Mh - x fora large fraction 
of x and h. We would like to somehow “integrate” this and conclude that (x) 
behaves like iM x - x. However it turns out that to achieve this we need to ensure 
that M is somehow “symmetric”. This is the purpose of the next stage of the 
argument. 


11.3.3 Symmetrizing the derivative 


We now show that (11.20) forces a certain symmetry property on M. Note that 
this estimate implies that 


[Enew, b(A)ExvezT"*” f (x) f()e(—(Mh + &) - x)| = Q( 9?) 


for some choice of bounded function b(A). We will focus on the term e(—Mh - x), 
and conceal all the other terms using the b() notation, thus obtaining 


[Enewy:xezb(h)b(x)b(x + h)e(—Mh - x)| = R(n? ®). 


Splitting Z into cosets of W; and using the pigeonhole principle, we can find 
x, € Z such that 


[Ex new, b(A)b(x + xı)b(x + h + xye(—Mh - (x + x1))| = R(n? ®); 
absorbing the x; factors into the b() notation we conclude 
[Ey new, b(A)b(x)b(x + hye(—Mh - x)| = Q(n?), 


Now we proceed as in previous steps, using Lemma 11.3 to eliminate the b() terms, 
though this time we get rid of the variables in a slightly different way. First we 
eliminate the b(A) factor using Lemma 11.3 to conclude 


[Ex y new, b(x)b(y)b(x + h)b(y)e(—Mh - (y — x)|] = Q(n?). 


Normally we would make the substitution y = x + h’, but instead we make the 
substitution z = x + y + h to obtain 


Ex, y zem b(z, x)b(z, ye(—M(z — x — y) -O — x) = (n°). (11.21) 
Since 


e(—M(z — x — y): (y — x)) 
= e(Mx - y — My-x)e(—Mz- y+ My. y)e(Mz-x — Mx - x) 








11.3 Proof of Theorem 11.6 437 


we conclude (after absorbing some factors into the b() terms) that 
IE; y zem, b(z, x)b(z, y)e(Mx - y — My - x)| = Q(n). 
Pigeonholing in z, we derive 
[Ex yew, b(x)b(y)e(Mx - y — My - x)| = Q(n?"). 
We eliminate b(x) using Lemma 11.3 to conclude 
[Ex h yem BODY + A)e(Mx -h — Mh- x)| = Q(n°) 
Applying the triangle inequality to eliminate b(y), b(y + h) we deduce 
Ey, yew, Exew,e(Mx -h — Mh - x)| = R(n’). 
Introduce the symmetry space 
W := {h € Wi : Mx -h = Mh - x forall h € Wj}. 
Then from Lemma 4.5 we have 
Eyew,e(Mx -h — Mh- x)= Whe W) 
and thus 
IWI/IW1] = 2(n9). 


In particular we have 
x A = 1 é E —0(1) 
dim; (W) > dimp (W) — O | log — } > dimp (Z) — O(n ). 
n 


Returning to (11.20), we see from covering W; by translates of W and using the 
pigeonhole principle that there exists h3 € Z such that 


Prew([ExezT tt f(x) f@e(—(M(h + h3) + &)- x)? = 8/2) = Q(?). 
(11.22) 


11.3.4 Eliminating the quadratic phase 


We are now ready to finish the proof of Theorem 11.6. From (11.22) we see in 
particular that 


[Encewb(h)ExezT"**™ ffel Mh + h3) + &) - x)| = Q(n?) 


for some bounded b(A). We now focus on the f(x) term and conceal many of the 
other terms using the b() notation, obtaining 


[Encw:xezb(hyb(x + h) f(x)e(—Mh - x)| = Q(n?"), 





438 11 Szemerédi’s theorem for k > 3 


where we used the identity e(€ - x) = e(Ẹ - (x + h))e(—é - h) to eliminate the phase 
terms which were linear in x. Splitting x into cosets of W and using the triangle 
inequality we conclude 


|Ey<zEn,xewb(h)b(x + y +h) f(x + yye(—Mh - (x + y))| = R(n”), 
which we rewrite as 
[EyczEnxewbth, yb + h, y) f(@ + ye(—MA - x)| = Q(?). 


By construction of W, we know that Mh - x = Mx -h for all x,h € W. We now 
divide into two cases depending on whether F has characteristic 2 or not. If F has 
odd characteristic, then we have 


1 1 1 
e(—Mh-x)=e (-5m +h)- (x+ ») e (jm . x) e (5m . n) ; 
If we then set fy : W — C to be the function 


oa, fi. 
ha) = FOF ye (jms x) 
we conclude that 


[Ey<zEn,xewbth, y)b(x + h, y) fy(x)| = R(n). 


On the other hand, from Lemma 11.4 (after a linear change of variables) we see 
that 


|E; xewb(h, yb +h, VHO < If llueavy 
and thus by (11.9) 
2 
[Evezll folleanl = 2(0?). 
By Cauchy—Schwarz we conclude 
[Eyez il fy ley] = 2(9°). 


Since the w3(W) norm controls the u2(W) norm, and the quadratic phase e(4M. x- 
x) does not affect the u3(W) norm, we have 


fy llu) < I fy llus =||fot+ >) Perera) = IFleg+w 


and we obtain (11.14) as desired. 

Now we argue for the case when F has characteristic 2, using an observation of 
Alex Samorodnitsky (private communication). Since M is symmetric on W, the 
function x +> Mx - x is in fact linear on W (here we rely on the characteristic 2 
hypothesis). Thus we can write Mx - x = € - x for some £ € W. By passing to the 


11.3 Proof of Theorem 11.6 439 


orthogonal complement of £ in W if necessary we may assume that £ = 0, thus 
Mx -x =0 for all € € W. This allows us to find a transformation A : W > W 
such that Mh -x = Ah-x + Ax - h; for instance, one can write M as a matrix 
with coefficients in F, use the hypothesis Mx - x to show that the matrix has zero 
diagonal, and then take A to be the upper triangular portion of M. We then have 


e(—Mh - x) = e(—A(x +h) - (x + A))e(Ax - x)e(Ah - h) 











and the rest of the argument proceeds as before. 





Remark 11.11 The fact that we have to pass from the original space Z to a 
subspace W of somewhat lower dimension is a defect of the argument. If one 
knew the polynomial Freiman—Ruzsa conjecture (Conjecture 5.34) one could set 
W = Z, which would lead to somewhat stronger results in applications. 


We now comment briefly on extending these arguments to higher k, to obtain 
Szemerédi’s theorem in general. At this time of writing the inverse U? theorem 
has not been extended to higher k, even in the simple case of a vector space over a 
finite field. However, Proposition 11.10 has been extended successfully to general 
k: 


Proposition 11.12 (Lack of uniformity implies density increment) /138] Let 
Z = Zy be acyclic group of odd prime order, let k > 3, and and let f : Z > C 
have magnitude bounded by | be such that Ez( f) = Oand || f || uriz) = n for some 
0<n<1.IfN > exp(Ox(n-%), then there exists a proper arithmetic progres- 
sion P in Z of length | P| = Q(N“) for some absolute constant 0 < ck < 1 such 
that 


Erer f(x) = Qu (n™®). 


This leads ultimately to the bound 


N 


for all k > 3 and large N, where cg > 0 depends only on k; in fact in [138] the 
explicit value c, = 1/ 2?” is attained. This is currently the best bound known for 
r,(Zy) for general k > 4 and large N. It is however likely that this can be improved 
to OG based on analogy with the k = 3 case. 

The proof of Proposition 11.12 is quite lengthy and difficult. In principle, one 
wishes to induce on k, leveraging inverse theorems for U*~? to obtain inverse 
theorems for U‘~!. This was the strategy employed at the start of the proof of The- 
orem 11.6, using the simple inverse theorem (11.9) for U to create the partially 
defined derivative €(), which one then obtains arithmetic structure on. Unfor- 
tunately this strategy has not yet been made to work even for k = 5 and for the 


440 11 Szemerédi’s theorem for k > 3 


model case of a vector space over a finite field, mainly because the inverse theo- 
rem for U? is much weaker than that for U?, in particular involving an unknown 
space W (or a Bohr set B), which will ultimately depend on a certain shift param- 
eter h in an unpleasant way. To prove Proposition 11.12, Gowers employed a 
slightly different approach, starting with the original function f and taking k — 3 
“derivatives” f +> T” f f to reduce the U‘~! norm to the U? norm. Employ- 
ing the U? inverse theorem, one then obtains a k — 3-fold derivative function 
&(h,,..., hy_3). The strategy is then to establish some multilinearity properties 
of this function & in order to execute a similar scheme to the one described above. 
This requires a substantial amount of new combinatorial technology, not least of 
which is a multilinear version of the Balog—Szemerédi—Gowers theorem, which 
cannot be established simply by applying the Balog—Szemerédi—Gowers theorem 
separately in each variable (again because of the issue that the structures obtained 
in this way for one variable will depend on the other variables). See [138] for 
details. 


Exercises 


11.3.1 (Alex Samorodnitsky, private communication) Let f : Z — C, and let 
D : Z x Z —> R* denote the quantity D(h, £) := |T} f f (€)|*. Establish 


the identity 
4 
En, o to taez hy yho=hathia I] Dthj;, §;) 
&1,82,83,€4€ 2:8) +8 =83 +84 j=l 
= $ Erez Dlh, Ý. 
EEZ 


(Hint: first show that D is essentially its own Fourier transform.) This 
identity can be used as a substitute for the first part of the above argument. 


11.4 Soft obstructions to uniformity 


In the last two sections we described the approach of Gowers in proving Sze- 
merédi’s theorem. There were three main components to the argument. First, there 
was the generalized von Neumann theorem (11.8) which showed among other 
things that one could approximate A;(f,..., f) by Ez(f) as long as f — Ez(f) 
was sufficiently Gowers uniform of order k — 2. Second, there was the inverse 
theorem, which implied that if f — Ez( f) was not Gowers uniform of order k — 2 
then there was enough structure on f to conclude a density increment for f on 


11.4 Soft obstructions to uniformity 441 


a subspace or sub-progression of Z. Finally there was the standard density incre- 
mentation argument that iterated the previous two observations to conclude the 
proof of Szemerédi’s theorem. 

Of the three components mentioned above, the second was by far the most 
difficult. The reason is that this approach requires a rather strong type of inverse 
theorem, and in particular requires one to give quite “concrete” or “hard” obstruc- 
tions to Gowers uniformity, in order to conclude the desired density increment. 
There is however an alternative approach, similar to the finitary ergodic argu- 
ment given in Section 10.5, which requires much “softer” obstructions to Gowers 
uniformity, in the sense that these obstructions are not presented in as explicit a 
form as, say, a polynomial phase function. This makes the second stage of the 
argument immensely simpler. However, one must now make the third stage of the 
argument more complicated, replacing the density incrementation argument by an 
energy incrementation argument, and then establishing some sort of recurrence 
result for the soft obstructions. This last step now becomes rather difficult, for 
instance involving van der Waerden’s theorem. One consequence of this is that 
the quantitative bounds obtained by this method are extremely poor. Nevertheless, 
this approach is quite robust, requiring very little arithmetic structure as compared 
with Gowers’ approach. 

To describe this approach to Szemerédi’s theorem, let us first review the ingre- 
dients used in the finitary ergodic proof of Roth’s theorem in Section 10.5. The 
strategy was to approximate the original function f by some low complexity 
approximation fy, such that the error fy = f — fy- was suitably uniform. One 
achieves this iteratively: if one has some preliminary approximation fy. whose 
error fy is not sufficiently uniform, then one concludes that fy correlates with a 
certain obstruction to uniformity, which in this case was a character eg. One then 
constructs a ø -algebra out of this obstruction eç and uses that algebra to refine the 
approximation fy. to f, increasing the energy (L? norm) of fy in the process. 
One repeats this procedure until the error finally becomes uniform (and hence 
negligible). The only remaining task is then to establish some recurrence property 
for the approximation fy, namely a lower bound on A;(fy-,..., fy+). The key 
here was that the approximation fy. was built out of the o-algebras associated 
to characters, and was hence almost periodic; this led to a non-trivial recurrence 
property for fy. 

The above argument used Fourier analysis by involving the characters eg. How- 
ever, one could replace this family of functions by any other family of functions, 
provided that two properties hold: firstly, that there were enough functions to 
provide a complete set of obstructions to Gowers uniformity of order k — 2, and 
secondly, that any function generated by these functions (or more precisely by 
their associated o -algebras) had enough “almost periodicity” to lead to recurrence. 


442 11 Szemerédi’s theorem for k > 3 


Using this observation, it becomes possible to dispense with Fourier analysis alto- 
gether by working with a somewhat different family of functions, replacing the 
characters with dual functions of order k — 1 and almost periodic functions with 
uniformly almost periodic functions of order k — 2. 

We now discuss these concepts in more detail. We begin with the concept of a 
dual function. 


Definition 11.13 (Dual function) If f : Z — C and d > 1, we define the dual 
function Da(f) : Z — C recursively by 
DNE) =Ez(f); Daa (AE) = EnezT" fOD T f f(x). 


When d = 2 one can compute the dual function in terms of the Fourier 
transform: 


DAP) = EnezT" f (x)Ez(T" f f) 


= Errez T" fT" FOTE f) (11.24) 
=YIPOP OE- w. 
EEZ 


we leave this as an exercise. The formula for higher d is more complicated, for 
instance 


DPA) = En rrez T OT FOT FOTE OTE OTM (TNX). 
We observe the useful translation and conjugation invariance 
DaT" f) =T"Dalf); Dalf) = Daf) (11.25) 


which is easily established by induction. 
Dual functions are intimately connected with the Gowers uniformity norm. An 
easy induction gives the identity 


IF = fe Daea = Esez fD (11.26) 
while from the Gowers—Cauchy—Schwarz inequality (11.6) we have the inequality 
K8 Da(f)) 122] < IIglloaayll fh (11.27) 

for all f, g : Z — C. In particular we have the dual characterization of U“(Z): 
‘I flluaz < 1} (11.28) 


which explains the terminology “dual function”. From (11.26) we immediately 
obtain an easy inverse theorem: 








Ilgllvaczy = sup {|(g, Dal P) rz) 


Lemma 11.14 (Soft inverse theorem) Let f : Z — C be a function bounded in 

magnitude by 1, and let F = Dg(f) be the dual function. If || f \|vacz) = n, then 
d 

Kf Fy > n”. 


11.4 Soft obstructions to uniformity 443 


Thus dual functions are a complete set of obstructions to Gowers uniformity, 
and will play the role that the characters eg played in Section 10.5. (To see the 
connection, observe that D2(e:) = es for any character e;, thus characters are 
themselves a kind of dual function.) To use this inverse theorem effectively in the 
finitary ergodic argument, we need to show that functions that are generated out 
of o-algebras of dual functions obey some sort of “almost periodicity” property. 
The actual definition is rather strange-looking and to motivate it we first give an 
informal discussion. For sake of concreteness we work in the group Zy. In this 
setting, all functions f : Zy — C are of course periodic of order N, but we are 
interested in almost periodicity properties which occur for shifts much smaller 
than N, in the sense that the shifts T” f are somehow compressed into a space of 
“dimension” much smaller than N, whatever that means. As it turns out, there will 
be a different notion of almost periodicity for each order d — 1; roughly speaking, 
a function should be almost periodic of order d — | if its phase or phases behave 
like a polynomial of degree d — 1. 

Let us quantify this intuition with examples. The function f(x) = e(§x/N) is 
a model example of a function which we expect to be “almost periodic of order 
1”, as its shifts T” f are quite recurrent. Indeed we have the formula 


T" f =¢nf 


where c, are the constants c, = e(&n/N). If we instead take the function f(x) = 
e(&;x/N) + e(&.x/N), then this function would still be considered almost periodic 
of order 1, since we have the formula 


T" f = Cn,181 + Cn,282 


where c,,; are the constants c,„,; = e(&jn/N), and g; are the bounded functions 
gj(x) = e(€;x/N). Thus in this case the shifts T” f of f only vary in a two- 
dimensional space. 

Next, we consider the function f(x) = e(ax? /N). This function would not be 
considered almost periodic in the usual sense, as the shifts seem to take values in 
a very high-dimensional space (as large as N). Indeed we have the shift formula 


Pe =Cnf 


where the c, are no longer constant, but are themselves linearly independent func- 
tions of x: c,(x) = e((2anx + n?) /N). However, observe that while the c, are not 
constant, they are still “simpler” than the original function f because they are 
almost periodic of order 1, whereas we expect the quadratic object f to be almost 
periodic of order 2. 

One can of course continue these examples. They lead to the following recursive 
heuristic: a function f should be considered almost periodic of order d — 1 if 


444 11 Szemerédi’s theorem for k > 3 


one has some representation of the form T” f = cn.191 + Cn,282 +--+, where the 
81, 82,... are bounded functions and the cn,1, Cn,2, . . . are almost periodic of order 
d — 2. Of course one should also provide some bound as to how many terms appear 
in this expansion, otherwise everything will be almost periodic of every order. 

A convenient way to formalize the above intuition is as follows. 


Definition 11.15 (Uniform almost periodicity norms) [357] If f : Z — C, we 
define || f llya poz) to be infinite if f is non-constant, and equal to |c] if f is equal 
to a constant c. If we now inductively assume that the UA P4(Z) norm has been 
defined for some d, we define the U A P4*!(Z) norm of f to be the infimum of all 
the constants M > 0 for which one has a representation formula of the form 


T" F = ME(cn ngn) for alln € Z, (11.29) 


where H is a finite non-empty set, g = (g;,)nexH is a collection of functions from 
Z to C with || gallzecz) < 1, € = (Cn,h)nez,hen is a collection of functions from Z 
to C with ||cy,,4lluap¢z) < 1, and h is a random variable taking values in H. 


We informally refer to a function as uniformly almost periodic of order d — 1 
if its UA P?—'(Z) norm is bounded. 

One can easily check inductively that the U A P“(Z) norms are finite for d > 1, 
and are indeed norms, in particular obeying the triangle inequality 


If + gll UAP) = If lluap(z) F lelua paz. (11.30) 


Moreover, we have the important Banach algebra property 


Il felluaPiz) < Il flluarazyllelluapecz)- (11.31) 


We leave the easy verification of these facts as an exercise; the rather complicated 
construction in Definition 11.15 was designed primarily in order to obtain these 
nice properties (11.30), (11.31). 

The U AP®-! norms are a kind of dual to the U? norms; see Exercise 11.4.8. 
The UAP! norm is the same as the Wiener algebra norm, see Exercise 11.4.10. 
They are also connected to dual functions: 


Lemma 11.16 Let f : Z — C be a function bounded in magnitude by 1. Then 
IDa(Mluape(z) < 1 foralld > 1. 


Proof We induce on d. The case d = 1 is clear. Now suppose that d > 2 and 
the claim has already been proven for d — 1. From the definition of Dg(f) and 
(11.25), and the change of variables n + h = h’, we have 








T"Da(f) = Enez(T°™ f Daa (T+ fT" f)) = Evez(Da_-1(T” fT" f)T" f). 


11.4 Soft obstructions to uniformity 445 


The claim then follows by setting M := 1, H := Z, cn,n = Da_i(1" fT" f), and 
8h := T’ f. 
Combining this with Lemma 11.14 we see that the uniformly almost periodic 


functions of order d — 1 form a complete set of obstructions for the Gowers uni- 
formity norm of order d: 














Corollary 11.17 (Soft inverse theorem, II) Let f:Z— C be a function 
bounded in magnitude by 1 with ||f\lya(z) > n. Then there exists a function 
F : Z > C such that || F \\yapa < 1 and |( f, F)| > n”. 


One now has enough machinery to prove the following variant of 
Proposition 10.36. 


Proposition 11.18 (Koopman—von Neumann decomposition) /357] Let k > 
3, let f : Z — R* be such thatO < f(x) < 1, leto > 0, andlet F : Rt x Rt > 
Rt be an arbitrary function. Then there exists a quantity K = Os,f, (1) and a 
decomposition f = fy. + fu with the following properties: 


e the “anti-uniform” component fy. obeys the bounds 0 < fy+ < 1 and 
Ez fu = Ez f. Furthermore there exists an approximation fyap to fy with 
0< fuar < L, || fut — fuar llez < o, and || fuar lluar- < K; 

e the “uniform” component fy obeys the Gowers uniformity estimate 


1 
Il fulu- < Few: 


This proposition is proven by almost identical means to Proposition 10.36 and 
we leave it as an exercise. The soft inverse theorem in Corollary 11.17 allows us 
to use uniformly almost periodic functions as a substitute for characters (and for 
quasi-periodic functions); the Banach algebra properties of such functions are the 
substitute for the fact that polynomial combinations of almost periodic functions 
are almost periodic. Otherwise the proof is much the same. 

To conclude the proof of Szemerédi’s theorem r;,(Zy) = On-+00:«(V), one needs 
a recurrence theorem for the almost periodic component: 


Proposition 11.19 (Uniformly almost periodic functions are recurrent) /357] 
Let k>3, let N be a large prime, let fy, fuap:Zy > Rt be 
such that 0 < fyi, fuap <1, Ezy fu = ô, Wfus — fuarla) < qm and 
ll fuar lluapt-2(zy) < K. Then we have 


Ak( fut, ..., fur) = Qk s,x (1). 


From Proposition 11.19, Proposition 11.18 and (11.8) one can conclude Sze- 
merédi’s theorem by the same argument as in Section 10.5. The proof of Propo- 
sition 11.19, however, is rather difficult, invoking an induction on k, the use 


446 11 Szemerédi’s theorem for k > 3 


of an energy increment argument to regularize certain o-algebras which will 
appear, some Hilbert space arguments to locally compactify shift orbits such as 
{T" fuar : fuap € Zn}, and then van der Waerden’s theorem to find monochro- 
matic arithmetic progressions, where the coloring is determined by the local com- 
pactification. We will not prove it in full generality here, referring the reader to 
[357] for full details. However, we will sketch the somewhat simpler k = 3 version 
of the argument below. In this case one could instead rely on Exercise 11.4.10 and 
Proposition 10.35 to obtain a simpler proof with much more efficient bounds, but 
the argument we give below does not require the Fourier transform and can be 
extended (with additional arguments) to the higher k case. 


Proof of Proposition 11.19 in the k = 3 case (Sketch) We consider the shifts 
{T" fuap : n € Zy} as a subset of L?(Zy). Since ll fuaplluap! < K, we see that 
there exists a random variable taking values in a finite set H and functions 
8n : Z — C with || gnl|z~(z,) < 1, such that all the shifts T” fy.4p are contained 
in the set 


I := {KE,(cngn) : cn € C, |cn| < 1 forall h € H} (11.32) 


which can be thought of as a kind of high-dimensional cube. It turns out that 
this set is “compact” in the sense that it can be covered by Okx,s,x(1) balls in 
L? (Zy) of radius 67/1024 (see Exercise 11.4.13). This induces a coloring of Zy 
by O;,x (1) colors, by assigning to eachn € Zy one of the balls that contains T” f. 
By van der Waerden’s theorem (Exercise 6.3.9), we conclude that for Qs x (1) 
of the pairs (a,r) € Zy, the triple a,a+r,...,a+2r are monochromatic, so 
that the functions T¢ fy ap, T“*" fuap, T°” fuap lie in the same 57/1024-ball. 
This implies that the functions T“ fy., T¢*" fy1, T¢**" fy. are distance at most 
6/512 apart. Since these functions are also bounded between 0 and 1 and have 
mean ô, an application of Markov’s inequality then shows that these functions 
are simultaneously greater than 5/4 (say) on a set of density at least 5/4. Thus 
E(T? fyiT fun T+ fy) = Q(1) for all such pairs (a, r). Taking averages 
over all a, r we obtain the claim. 

















Exercises 


11.4.1 Prove (11.24) and (11.25). 

11.4.2 Prove (11.26), (11.27), and (11.28). 

11.4.3 Verify that || f || u 4 paçz) is well-defined and finite for all d > 1, and obeys 
(11.30) and (11.31). In particular, verify that the U A P4(Z) normis indeed 
a norm. 

11.4.4 Establish the monotonicity property || fllya pez) < || flluapa(z) for all 
f:Z— Candd > 0. 


11.4.5 


11.4.6 


11.4.7 


11.4.8 


11.4.9 


11.4.10 


11.4.11 
11.4.12 


11.4.13 


11.4 Soft obstructions to uniformity 447 


Let @ : Z > Z' be a Freiman isomorphism of order 2. Show that || f o 
Olluape—(z) = If lu ape-1(zy forall f : Z'—> Candd > 1. In particular, 
the UAP4—!(Z) norms are translation-invariant. 

Let 6: Z — R/Z be a phase polynomial of degree less than d. Show 
that elp) f lluar = If lluar for all f: Z > C. 

Let : Z > R/Z be a phase polynomial of degree less than d. Show that 
Dale(@)) = e(@) and |le(@)|lyv4pa-1(z) = 1, thus every polynomial phase 
function is a dual function. 

[357] Obtain the inequality 


KE Drel < Wf llueallgllvarz) 


for any d > 1 and f, g : Z — C. (Hint: use induction on d.) Thus func- 
tions which are uniformly almost periodic of order d — 1 are almost 
orthogonal to Gowers uniform functions of order d. This can be viewed 
as a partial converse to Corollary 11.17. Note in particular that we have 


2 
Il Fllzaz < WF lua flu ape 


thus a function cannot be simultaneously uniformly almost periodic and 
Gowers uniform without also being small. 

Let f, g : Z — C be functions bounded in magnitude by 1. Establish the 
inequality 


24 
Ife liaz < If lluapeayllgiluaz 


for all d > 1. (Hint: use Lemma 11.16 applied to fg, together with the 
algebra property of U A P4—!(Z).) 

(Ben Green, private communication) Show that || f ||yapi(z) = If llnczy 
for all f : Z — C. (Hint: from Exercise 11.4.7 and the triangle inequal- 
ity one can obtain the inequality || f\lyapyz < If llaw: To obtain 
the other inequality, first use Plancherel’s theorem to establish that 
En xezCn,ngn(x)b(x + n)| < 1 whenever cn,n is a constant bounded by 1, 
8p is a function with || gp || ,-0¢z) < 1, and b is a function with lliz) <1.) 
[357] Prove Theorem 11.18. 

Use Proposition 11.19, Proposition 11.18 and (11.8) to deduce that 
rp(Zy) = On-s00:x(N) for all k > 1 and all large N. 

[357] Let T be the set defined in (11.32). Show that given any £ > 0, the 
set I can be covered by O, x (1) balls in L?(Z) of radius e. (Hint: find 
a maximal orthonormal set v),..., vy such that E| (gp, viol > 27/4 
forall 1 < j < J, and use Bessel’s inequality and linearity of expectation 
to obtain an upper bound on J. Show that the quantities stay within 
é/2 of the J-dimensional space spanned by v1, ..., vy.) 


448 11 Szemerédi’s theorem for k > 3 


11.5 The infinitary ergodic approach 


In this section we discuss some of the ideas underlying Furstenberg’s infinitary 
ergodic approach to Szemerédi’s theorem. These arguments are the shortest and 
most elegant way to prove the theorem, but also require a certain amount of machin- 
ery concerning infinite measure spaces. Also it is quite difficult to extract a quan- 
titative bound from these methods. As the techniques here are rather disjoint from 
those in the rest of this book we shall not provide full details, referring the reader 
instead to [122]. However, the insights developed here were essential in devel- 
oping several of the finitary arguments in this chapter, most notably the finitary 
ergodic proof of Szemerédi’s theorem, and the Green—Tao theorem on arithmetic 
progressions in the primes. 

Define a measure-preserving system to be a (possibly infinite) space X with a ø - 
algebra $, a probability measure Py on $, and a bijection T : X — X such that all 
the powers T” of T with n € Z are measure-preserving, thus Py(T” A) = Px(A) 
for all A € B. In this infinite setting, a o-algebra cannot be rigorously viewed 
as a partition; instead it is a collection of sets closed under countable unions, 
intersections, and complements, and containing Ø and X. We define an expectation 
Ex on bounded measurable functions from X to R in the usual manner, and define 
a shift operator T” on such functions by T” f(x) := f(T x). To simplify the 
notation slightly we shall only work with real-valued functions in this section 
rather than complex-valued ones. 


Example 11.20 (Circle shift) Let X be the unit circle R/Z with Lebesgue mea- 
sure, and let T be the shift Tx = x + «œ for some fixed a € R. The dynamics of 
this system depend on whether « is rational or irrational; for instance, in the former 
case the shift T is periodic, but not in the latter case. However in both cases we 
have the following almost periodicity property: given any bounded measurable 
function on X, the shifts {T” f : n € Z} are pre-compact in L?(X). In particular 
given any € we have ||T” f — f|lz2(x) < £ for infinitely many £. Because of this 
property we say that this measure-preserving system is compact. 


Example 11.21 (Skew shift) Let X be the torus (R/Z) x (R/Z) with Lebesgue 
measure, and let T be the skew shift T(x, y) := (x +a, y+ x) for some fixed 
a € R. Note that the orbits T”(x, y) are linear in n in the x variable, but quadratic 
inn in the y variable. This system is not compact, but contains a non-trivial compact 
factor, namely the o-algebra Bo consisting of all the sets of the form A x (R/Z), 
where A is Borel measurable in R/Z. (To put this another way, the By-measurable 
functions are precisely those functions which do not depend on the y variable.) 
This factor is isomorphic to the circle shift mentioned earlier. It turns out that the 
skew shift is a relatively compact extension of the circle shift, though we will not 


11.5 The infinitary ergodic approach 449 


quantify precisely what this means here except to observe that if f is a smooth 
function on (R/Z) x (R/Z), then the orbits {T” f : n € Z} form a precompact set 
on each fiber {x = constant} of By, endowed with the obvious one-dimensional 
measure. 


Example 11.22 (Bernoulli shift) Now consider the infinite unit cube X := 
[0, 1]4 of infinite binary sequences (@,)nez, With the usual product topology 
and Borel o-algebra B. Let B C X denote the “cylinder” of sequences where 
wo = 1, and let T be the shift operator defined by T” (@p)nez = (@n+nnez- Using 
the Kolmogorov extension theorem (or Caratheodory’s extension theorem and 
Tychonoff’s theorem) we can find a measure P on X such that 


P(T" BA- -+ N T™ B) = 2™ 


whenever h1, .. . , Am are distinct integers. Informally, one can view this system as 
the probability space corresponding to an infinite number of coin tosses, one for 
each integer h; the event T” B is then the event that the Ath coin turns up heads, and 
the shift operator corresponds to relabeling all of the coins up by 1. The behavior 
here is completely different from the compact case; indeed, if f is bounded and 
measurable, and has mean zero, one can show that (T" f, f)12(x) > Oasn —> oo. 
A system with this property is known as strongly mixing. 


Furstenberg derived Szemerédi’s theorem by proving the following equivalent 
formulation. 


Theorem 11.23 (Furstenberg multiple recurrence theorem) //2//], [125], 
[122] Let (X, B, P, T) be a measure-preserving system, and let f : X > R™ be 
a non-negative bounded measurable function with E(f) > 0. Then for all k > 1 
we have 


lim inf E}<,<yEy fT" f ree TE- Dr f >0. 
N->oo ft 


It is fairly easy to deduce this theorem from Szemerédi’s theorem; we leave this 
as an exercise. The converse deduction of Szemerédi’s theorem from Furstenberg’s 
theorem is a little trickier, requiring some measure-theoretic tools: 


Proof of Theorem 10.1 assuming Theorem 11.23 (Sketch) Suppose for contra- 
diction that we can find a set A C Z of positive upper progressions containing no 
progressions of length k. Thus we can find a sequence of integers N1, M2, . . . going 
to infinity such that lim infj— oo Pi-n,. n,(A) > 0. Now use the Hahn—Banach 
theorem to construct a linear functional 2 on bounded real-valued sequences 


(cj = , such that 


liminfc; < ACCEDE < limsupcj. 
Jo jroo 


450 11 Szemerédi’s theorem for k > 3 


Now consider the infinite unit cube X := [0, 1]% of infinite binary sequences 
(@n)nez, With the usual product topology and Borel o-algebra B. Let B C X 
denote the “cylinder” of sequences where wp = 1, and let T be the shift operator 
defined by T” (On) ne = (@n+n)nez. Using the Kolmogorov extension theorem 
(or Caratheodory’s extension theorem and Tychonoff’s theorem) we can find a 
measure P on X such that 


POT" B A -++ O T'™B) = A(t, n aCA + hi) O- (A + And) 


for all hi, ..., hm € Z. In particular we see that P(B) > 0. By Theorem 11.23 
applied to f = 1g we conclude that P(B N T” B A - -- N T*—" B) for at least one 
non-zero B, which implies that A contains a progression of length k. 














One can prove the multiple recurrence theorem in a manner similar to that in the 
previous sections. For instance, there is an analog of the Gowers uniformity norm 
Ilf \lv2cx), defined inductively for bounded measurable f by || f||yorx) := Ex(f) 
and 


i f h 24-1 1/24 
If lluecxy = Jim (Eyenewll fT Filpetes) : 


(The existence of this limit is guaranteed by the von Neumann ergodic theorem; 
see [185].) One can verify that these U“ norms obey properties similar to their 
finitary counterparts; see [185], with a key distinction that it is now quite possible 
for a non-zero function f to have a vanishing U? norm. We have an important 
analog of the generalized von Neumann theorem (11.8), namely that 


lim EycnevEy T” fi- TE fg = 0 
N->oo eos 


whenever fo, ..., fk-1 are bounded measurable functions with at least one of the 
fj having a vanishing U k-1 norm. Thus functions with vanishing U‘~! norm have 
a negligible impact on recurrence. 

Again, attention now turns towards the obstructions to uniformity. It turns out 
that in the infinitary setting these obstructions have a rather nice description. Let 
U*-1(X)* denote the space of all bounded functions f for which the expression 


II flute = sup{IEx(fg)] : gluta) < 1} 


is finite. It turns out (see [185]) that there exists a unique o-algebra Zz_2 such 
that the closure of U‘—!(X)* in the L? topology consists precisely of those square- 
integrable functions which are measurable with respect to Z,_2; the Z,_2 are thus 
the universal characteristic factor for the U‘~!(X) norm. As a consequence one 
can precisely quantify which functions are Gowers uniform of order k — 1: 


I flluciay =0 = > E(f|Zy_2) = 0. 


11.5 The infinitary ergodic approach 451 


Here the conditional expectation f > E( f|Z,_2) is defined as the L*-orthogonal 
projection onto the space of Z;,_2-measurable functions. 

One consequence of the above discussion is that in order to prove the Fursten- 
berg recurrence theorem, it suffices to do so under the additional assumption that 
f is Zy_2-measurable (because the error f — E( f|Z;,_2) has a vanishing U% (X) 
norm and is hence irrelevant). To do this, it is clearly of importance to understand 
the factors Z,_2 of 6 as much as possible. 

The factor Zo turns out to be the space of invariant sets in X, i.e. Zo := {A EB: 
TA = A}. This is essentially the von Neumann ergodic theorem, which we leave to 
the exercises. The factor Z, is known as the Kronecker factor and is generated by 
all the almost periodic functions, or equivalently by the eigenfunctions of the shift 
operator T. The higher factors are more difficult to describe explicitly. However, it 
can be shown without too much difficulty (see e.g. [121], [185], [236]); a closely 
related result is in [386]) that each factor Za, is a relatively compact extension of 
the preceding factor Z4 (in fact, it is the maximal relatively compact extension). 
What this means is a little bit tricky to describe precisely, but it roughly means that 
for a dense set of f which are measurable in Za4+1, the orbits {T” f : n € Z} are 
precompact relative to Z4, which informally means that they are precompact when 
restricted to each “atom” or “fiber” of Z4. See [122] for a rigorous formulation of 
these assertions (which requires the theory of disintegration of measures). Using 
some tools from measure theory and analysis, as well as a combinatorial argument 
closely related to the van der Waerden theorem, it was shown in [121], [125] that if 
the Furstenberg recurrence theorem holds for any factor Z4, then it also holds for 
a relatively compact extension Z441; this is analogous to Proposition 11.19. This 
fact, combined with the preceding discussion, yields the Furstenberg recurrence 
theorem and thus Szemerédi’s theorem. 

Recently, there has been significant progress by Host—Kra [185] (and subse- 
quently by Ziegler [386]) in understanding the factors Z,_». (Strictly speaking, 
Ziegler treats a slight variant Y;_» of the factors Z;,_2; see [236] for a comparison 
between the two.) It turns out that the factors Z,_ are isomorphic to the inverse 
limit of k — 2-step nilsystems, or in other words a system (G/T, B, T, P), where G 
is a nilpotent Lie group of order k — 2, F is a co-compact subgroup of G, B is the 
usual Borel algebra, T is a left shift operator T : xr > gxI for some fixed group 
element g € G, and P is normalized Haar measure. Thus for instance the circle 
shift in Example 11.20 is a 1-step nilsystem, whereas the skew shift turns out to 
be isomorphic to a 2-step nilsystem. These characterizations of Z,_2 are roughly 
analogous to the “hard” inverse theorems discussed in Section 11.2; see [160] for 
further discussion of this in the k = 4 case. Just as these hard inverse theorems 
lead to better quantitative results on Szemerédi’s theorem, the characterizations of 
Zx—2 given here lead to stronger recurrence theorems; for instance, they can be 


452 11 Szemerédi’s theorem for k > 3 


used to replace the limit inferior in the Furstenberg recurrence theorem with a limit, 
and in fact obtain the stronger result that the averages Ey<ncyT" f --- TED" f 
converge in L? norm to a non-zero (and somewhat explicitly describable) func- 
tion. See [185], [386]. A current area of research is to develop and simplify these 
ergodic theory results (which are currently quite difficult and lengthy to prove) and 
clarify their connection with the analogous developments in the Fourier-analytic 
and combinatorial approaches. 

The ergodic approach is well suited for establishing stronger combinatorial 
results than Szemerédi’s theorem, several of which have not yet been proven by 
other means. We describe some of them here. 


Theorem 11.24 (Multi-dimensional Szemerédi theorem) /{/23] Let d> 1, 
and let A CZ! be such that limsupy_,. Pi_-w,wy(A) > 0. Then for any 
Vi,..., Ue € Z4, there exist infinitely many pairs (a,r) € Z? x Z* such that 
atry,...,atru€A. 


Theorem 11.25 (Polynomial Szemerédi theorem) [23] Let P,,..., Py: Z — 
Z be polynomials that map the integers to the integers such that P\(0)=---= 
P,(O) = 0. Let A C Z have positive upper density. Then there exist infinitely many 
pairs (a,r) € Z4 x Z* such thata + P\(r),...,a+ P(r) € A. 


Theorem 11.26 (Density Hales-Jewett theorem) /124] Letn > l and0 < ô < 
1. Then there exists an integer d = d(|A|,5) > 1 such that if A is any sub- 
set of [0, n — 1]? with cardinality |A| > nf, then A contains a proper arith- 
\¢ 


metic progression a + [0,n —1]-v of length n, for some a € [0, n — 1]* and 


v € [0, 1]%. 


Further refinements include additional structural information on the pairs (a, r) 
constructed by the above theorems, as well as convergence of various limits; in 
addition, there is much current work in extending the description of the charac- 
teristic factor for the U* norm and for multiple recurrence to these more complex 
recurrence theorems. Unfortunately a complete survey of these exciting develop- 
ments is well beyond the scope of this book. 


Exercises 


11.5.1 Show that Theorem 11.1 for a fixed k implies Theorem 11.23 for the same 
value of k. 

11.5.2 (Poincaré recurrence theorem) Using only the pigeonhole principle 
and elementary measure theory, prove Theorem 11.23 in the k = 2 
case. 


11.5.3 


11.5.4 


11.5.5 


11.5.6 


11.5.7 


11.5.8 


11.5 The infinitary ergodic approach 453 


(Von Neumann ergodic theorem) Let (X, B,P,T) be a measure- 
preserving system. Show that the spaces {f € L7(X): Tf = f} and 
{Tf — f : f € L*(X)} are complementary orthogonal subspaces of 
L?(X). Use this to conclude that if Zo := {A € B: TA = A}, then 
Ei<n<n T” f converges in L?(X) to E(f |Zo) for any f € L?(X), and that 
ll flloz = IEC IZo) z2x). Note that these results simplify in the case 
when the system is ergodic (which means that Zo = {@, X}), since in that 
case E(f |Zo) is just Ex( f). In particular we have || f || y1(z) = |Ex(f)| in 
this case, just as in the finitary case. 

(Khintchine’s recurrence theorem) Let A be a subset of a measure- 
preserving system (X, B, P, T). Show that for every ¢ > 0 that there exist 
infinitely many n € Z such that P(A N T” A) > P(A)’ — e. (Hint: obtain 
lower and upper bounds for ||Ej <)<y 1r” a ||z2¢z). Alternatively, use the von 
Neumann ergodic theorem.) Show that the theorem fails if P(A}? — e is 
replaced by P(A)? +e, regardless of how small P(A) and e are. It is nat- 
ural to then conjecture that P(A N T”A N --- N T&A) > P(A) — e 
for infinitely many n; this is true for k = 1, 2, 3, 4 under the additional 
assumption of ergodicity, but fails for k > 4, see [22]. 

Let (X, B, P, T) be a compact measure-preserving system (so the orbits 
{T" f : n € Z} are precompact in L? whenever f is bounded and mea- 
surable). Prove the Furstenberg recurrence theorem in this special case. 
(Compare with Proposition 10.35 or the k =3 proof of Proposition 
11.19.) 

Let (X, B,P,T) be a weakly mixing measure-preserving system, 
which means that limy oo Ey<n<n|(T" f, Neol? = 0 whenever f is 
bounded, measurable, and has expectation zero. (This is weaker than 
strong mixing, which demands that lim,_,..(T” f, f) = 0 under the same 
hypotheses.) Show that || f||y*-1(x) = 0 if and only if Ey(f) = 0, and 
establish the Furstenberg recurrence theorem in this special case. 

Let (X, B, P, T) be measure-preserving system, and let f be bounded 
and measurable. Show that if f is almost periodic (thus the orbit 
{T" f : n € Z} is precompact in L?(X)), then Ex(fg) = 0 whenever g is 
bounded, measurable, and vanishing in U(X) norm. Compare this with 
Exercise 11.4.8. 

Let (X, B, P, T) be measure-preserving system. Let Z, be the small- 
est o-algebra with respect to which all almost periodic functions are 
measurable. If f is bounded and measurable, show that || || y2(x) = 0 if 
and only if E(f|Z,) = 0. (Hint: the “only if” part follows from the pre- 
ceding exercise. For the “if” part, construct the dual function D2 f := 
limy» E-n<n<n T” fEx(f T” f|Zo), and show that this function is 


454 11 Szemerédi’s theorem for k > 3 


almost periodic. You may need the fact that Volterra integral operators 
are compact.) 

11.5.9 (Koopman—von Neumann theorem) Let (X,5,P,7T) be measure- 
preserving system, and let f € L*(X). Show that there is a unique decom- 
position f = fyi + fu, where || fully2~x) = 0 and fy. is the limit in 
L?(X) of almost periodic functions. 


11.6 The hypergraph approach 


In Section 10.6 we saw that the Szemerédi regularity lemma led to a result in 
graph theory, namely the triangle removal lemma, which in turn implied Roth’s 
theorem (as well as a generalization to right-angled triangles). It is then natural to 
ask whether a similar approach can prove Szemerédi’s theorem for more general 
k. This turns out to be the case, but requires one work with hypergraphs (also 
known as set systems) instead of graphs. We need some notation. If A is a finite 
set and k > 0, we let (4) denote the collection of all the k-element subsets of A. 
Define a k-uniform hypergraph H = H(V, E) to be any pair (V, E), where V is a 
finite set (the vertex set), and E is a subset of cy (the edge set). Thus a 2-uniform 
hypergraph is the same as an ordinary graph. 

The triangle removal lemma, Lemma 10.46, can be generalized as follows. If 
H = H(V, E) isak-uniform hypergraph, we define a k-simplex in H to be any set 
S = {vj,..., Uegi} C V of k + 1 vertices such that (?) C e, i.e. the k + 1 edges 
S\{ui},..., S\vgq1 all lie in E. Note that a 2-simplex is the same as a triangle. 


Theorem 11.27 (Simplex removal lemma) /283],[284],[140] Let k > 2, and 
let H = H(V, E) be a k-uniform hypergraph which contains at most 5|V\**! 
k-simplices. Then it is possible to remove 03-50;4(|V |?) edges from H to obtain a 
hypergraph which is simplex-free (it contains no k-simplices whatsoever). 


This result was conjectured by Erdős, Frankl and Rödl [87] in 1986, but not 
proven in full until much later. The k = 2 case dates back of course to [304] in 
1978, but even the k = 3 case did not appear until 2002 [110] (though unpublished 
versions of this result existed much earlier, see for instance [109]); see also [139]. 
The full result was proven independently and simultaneously by Rödl and Skokan 
[283], [284] (see also [282], [254]) and Gowers [140]. A slight strengthening of 
this result was later established in [360] for the purposes of establishing arbi- 
trary constellations in the Gaussian primes. For a recent survey of developments, 
see [281]. 

Just as Lemma 10.46 implies Proposition 10.47, Theorem 11.27 implies the 
following higher-dimensional analog: 


11.6 The hypergraph approach 455 


Proposition 11.28 Let Z be a finite additive group, let k > 2, and let A C Z* 
be such that A contains no “right-angled simplices” x,x +re,,...,x +rep 
with x € Z* and r € Z\0, where (by slight abuse of notation) we write re; 
for (0,...,0,7,...,0) with the ith position being the only non-zero one. Then 
|A| = 0\z|->00e(1Z|*). 


This in turn can be used to deduce Theorem 11.24 as well as Szemerédi’s the- 
orem. In fact it yields Szemerédi’s theorem for an arbitrary group: more precisely 
we have rg(Z) = 0)7)-+00:x(|Z|) for any finite additive group Z with |Z| coprime 
to (k — 1)!. 

Just as the triangle removal lemma can be proven by the Szemerédi regularity 
lemma (indeed, this is currently its only proof known), the simplex removal lemma 
can be proven by a hypergraph regularity lemma. It turns out, however, that unlike 
the situation with the regularity lemma, where there is essentially one formulation 
(up to equivalences), there are several choices of hypergraph regularity lemma to 
choose from. The first such regularity lemma, introduced by Chung and Graham 
[58], regularizes the k-edge set E in terms of a partition of the vertex sets V, and is 
proven very similarly to the regularity lemma for graphs. Unfortunately, this lemma 
seems to be too weak to easily deduce the simplex removal lemma, the problem 
being that the regularity properties conferred by this lemma are not sufficient to 
obtain an accurate count for the number of simplices in the hypergraph, even in 
the 3-uniform case. The situation is intriguingly similar to the phenomenon noted 
in earlier sections that Fourier uniformity is insufficient to count progressions of 
length 4 or greater, even though Fourier analysis does not make an appearance in 
the regularity lemma. The solution (again in the 3-uniform case for simplicity) is 
to regularize the 3-edges by a partition of the 2-edge set (5); and then regularize 
the 2-edge partition further (essentially using the ordinary regularity lemma) using 
a partition of the vertex set V (or equivalently Gab This however leads to some 
new issues not present in the ordinary graph case. First, it is possible for the 
secondary partition to somehow disrupt the regularity obtained by the primary 
partition. Second, one has to decide on the relative strength of regularity between 
the primary partition and the secondary partition; this is particularly important 
since there is an expensive (tower-exponential) trade-off between the amount of 
regularity conferred by a partition, and the number of cells needed in the partition, 
and one may need the regularity in one partition to dominate the number of cells 
in another. Finally, even after all the appropriate regularity has been attained, one 
still needs to accurately count the number of simplices in the hypergraph. 

These problems can all be solved, but require a certain amount of technical- 
ity. We will not give the general details here, but we will do the k = 2 case (i.e. 
the usual regularity lemma) in detail, in a way which allows for a relatively easy 


456 11 Szemerédi’s theorem for k > 3 


extension to the higher k case. Our treatment here follows [360], [359] (see also 
[7], [282] for some closely related arguments). Namely, we will view the regu- 
larity lemma as being akin to the Koopman—von Neumann theorems of previous 
sections, decomposing a function into a “compact” component and a “uniform” 
component. Indeed we have the following analog of Proposition 11.18. Let us 
say that a function f : Vi x --- x Va > R* is K-constant if for each 1 < i < d 
there exists a partition of V; into K cells V;,1,..., Vig (some of the cells may be 
empty or otherwise unequal in size) such that f is constant on each of the products 
Vii X++- X Va, j. This will be analogous to the concept of almost periodicity used 
in previous sections. 


Proposition 11.29 (Preliminary regularity lemma) /359] Let V,,..., Va be 
arbitrary finite non-empty sets, let f : Vı x +--+ x Va > Rt be such that 0 < 
f(x, ...,%a) < 1, leto > 0, and let F :R* x R* > R* be an arbitrary func- 
tion. Then there exists a quantity K = Og,r(1) and a decomposition f = fyi + 
fu with the following properties: 


e the “anti-uniform” component fy- obeys the bounds 0 < fy+ < 1. 
Furthermore there exists a K -constant approximation fc to fyn with 
0< fe < land || fu — fellrzvx--xv) < o; 

e the “uniform” component fy obeys the regularity estimate 





Ey ev, 3889 xgeVy LA xx Ag U1, Sees, Xa) < F(o, K) 


forall Ay ©Vi,...,Aa C Va. 


Remark 11.30 For application to graph regularity one only needs d = 2. For more 
general d, the above lemma is closely related to the hypergraph regularity lemma 
of Chung and Graham [58]. 


Let us assume this proposition for the moment, and establish the regularity 
lemma in the more traditional formulation of Lemma 10.42. 


Proof of Lemma 10.42 assuming Proposition 11.29 Lete, m, and G = G(V, E) 
be as in the lemma. We set V; = V) = V, and let f (x1, x2) be the incidence matrix 
of G, i.e. f(x], x2) = I({x1, x2} € E). Leto > 0 be a small number to be chosen 
later (it will eventually be a small multiple of e4), and let F : Rt x Rt > R*+ bea 
growth function (depending on m) to be chosen later. We apply Proposition 11.29 
with d = 2 to obtain a decomposition f = fy. + fy,aK-constant approximation 


fc to fy. for some K = O,,r(1), and partitions Vi 1,..., Vi,x and V21,..., V2,K 
of V with respect to which fc is constant. We can take the common refinement 
Vkaü-1)}+j := Vi; N Vz j fori, j € [1, K] to obtain a unified partition V/,..., Vio 


such that fc is constant on each product V/ x V; . Next, we let N be the largest 


11.6 The hypergraph approach 457 


integer less than o|V|/100mK? (we will assume V large enough depending on 
m, K,o that N > 1), and partition each cell V; arbitrarily into disjoint sub-cells of 
size N, plus an error of size at most N. Note that all the errors, when put together, 
will give an error set V, of size at most K?N < o|V|/100m, while the remaining 


sub-cells yield a partition V;’,..., V;’ of the remaining set V\V,, with each V; 
having cardinality exactly N. Thus we have 
o|V] 
Vil —- —— kN <|VI, 
IV 100m ~ IV! 


thus k > m and k = @(|V|/N) = @(mK?/c). The partition V/’,..., Vi, Vee is 
not quite near-uniform in the sense of Section 10.6, because of the exceptional set 
V,. But we can break up V, arbitrarily into k near-uniform pieces and distribute 
them to the V’, replacing each set V” by a slightly larger set V;” such that the 





Vi",..., V” are a near-uniform partition of V and 
| Val K?N 
V’\V!| =O 1) = O | —— 41] = 0(0oN 
wiv" (H4 a (EN) 


where we again assume V to be suitably large depending on m, K, ø. Thus V;” is 
only larger than V’ by a factor of 1 + O(c). 

Let us now investigate the e-regularity of a pair V;”, Vi". Let X C Vj", Y C Vj" 
be such that |X| > €|V/”| and |Y| > e|V"|, in particular |X|, |Y| > €N. We wish 
to see if 


ld(X', Y^) — dV", Vi) < €; 


this will follow if we can show 


2. (11.33) 





Erev” mev” f (1, x2) (seo — ie} 
Vie) V” 





for arbitrary subsets X C V7”, Y C V;” (with no lower bound on cardinality). We 
make some preliminary reductions. Observe that we may restrict X to V/’ and Y 
to Y;’, and replace the range of x, and xz to V;" and V; respectively, and only 
incur an error of O(c) on the left-hand side. Similarly we may replace |X|/|V/”| 
and |Y|/|V;"| by |X|/N and |Y|/N and again only accept an error of O(c), thus 
estimating the left-hand side of (11.33) by 

|X| IY] 


Exev). nev! £1, X2) (1x10 = aa) + O(0). 


Now since fc is constant on V’ x V; we have 


Erev” mev” fo, x2) En E T e =0. 
Lee N N 


458 11 Szemerédi’s theorem for k > 3 


Next, by the uniformity of fy we have 





IXI IYIN] v 2 
N N )|T N? F(o,K) 


(Fen) 
= O | ———— }. 
oF(o, K) 


By choosing F suitably (e.g. F(o, K) = mK*/o7) we can ensure that the right- 
hand side is O(c). Putting this all together, we can bound the left-hand side of 
(11.33) by 


Eev} mev” fu(1, x2) (txt 


X| IY 
Exev) nevi (fut — fc)1, x2) (1x00 a KUEN) + O(c). 


By the triangle inequality, this is less than 
2Eyev! nev) |(fut — foxi, X2)| + O}). 
Now from Proposition 11.29 and Cauchy—Schwarz we have 
Exev,nevl(fut — foxi, x2)| < o 
which after trimming away the exceptional set V, gives 
Ey ev/u--Uvj/ nevi (fut — fc), x2) = O(0). 
By Markov’s inequality (and the uniform sizes of the v” ) we conclude that 
Exev/ mev/|(fut — foxi, x2)| = O(0/2) 


for all but at most € of the pairs (i, j). In such a case we obtain a bound of 
O(a/e) for the left-hand side of (11.33), which will be acceptable by choosing 
o equal to a small multiple of £. Finally, the bound K = O,,-(1) now implies 
a bound k = O,,(1) as required. This establishes the lemma (with the partition 
Vo. YO. 














Note that in the above proof only a very specific choice of function F() was 
needed. However, the ability to set the function F arbitrarily becomes very impor- 
tant in the hypergraph theory, as it is the easiest way to reconcile the problem 
mentioned earlier of needing to have the regularity control given by one partition 
dominate the number of cells of another partition without totally losing control of 
all the error terms. Of course the price one pays for this is that the total number of 
cells at the end of the argument becomes extremely large. 

We now begin the proof of Proposition 11.29. The reader may wish to focus on 
the d = 2 case for sake of familiarity, although the general d case is no different. 
We will re-interpret the partitions V;1,..., Vix of V; as o-algebras B; on V; for 


11.6 The hypergraph approach 459 


1 <i <d, which induces a further o-algebra B1 ®--- ® By on Vi x -+--+ x Va, 
formed by the Cartesian products V1; x ++- X Va,i,. Note in particular that the 
function fc := E(f |61 ® --- ® Bz) will be a K -constant function between 0 and 
1. The decomposition f = fy. + fy willbe givenby fy. := E(f |B| 8 --- ® Bi) 
and fy := f — fy+, where the B; are somewhat finer o -algebras than the 6;. The 
exact choice of B;, and B; will be determined by an energy increment algorithm 
very similar to that used to prove Proposition 10.36. 

We turn to the details. We fix V;,..., Vz and the function f : Vj x --- x Vy > 
R*. Given any o-algebras B,,..., By of Vi, V2, we define the energy E/(B, ® 
--- Q B1) by 


E(B, 8+ @ Ba) := IEC IB 8+ D BAN xxvi 


thus the energy ranges between 0 and | and finer o -algebras have higher energy. 
Just as Proposition 10.36 relied on Lemma 10.40, Proposition 11.29 will rely on 
the following analog. 


Lemma 11.31 (Lack of uniformity implies energy increment) Let u > 0 and 
K' > 1, and for each 1 < i < d let B; be a o -algebra on V; with at most K' atoms 
each such that 


[Ex eV,,.uxgeVy LAyx--xAg f — EC |B, S- 8 BIME. xD Z 


for some A, C Vi, ... Aq C Va. Then for each 1 <i < d there exists finer o- 
algebras B; than B; with at most 2K' atoms each such that 


EBI Q-Q Br) > EB 8- DBH. 


Proof For 1 <i < d, let B’ be the o-algebra generated by B; and A;. Observe 
that 


Exyecv, Fiii xa€ Va Vay xe Ay (f = | E(f IBY ® Pay ® BD, E] Xa) = 0 


since A; x --- x Ag is the union of atoms in BY ® --- ® BY, on each of which 
f — E(f|BY ® --- ® Bl) has mean zero. Subtracting this from the hypothesis we 
conclude 


IE, ev, AAN vaeva LAix xaa ECS IBY ® ae ® Bi) 


and hence by Cauchy—Schwarz 


JECfIBY S ---@ By) — EC IB, @ +++ @ BY pay; soxevy Z U 











The claim then follows from Pythagoras’ theorem. 





460 11 Szemerédi’s theorem for k > 3 


Proof of Proposition 11.29 This will be almost identical to Proposition 10.36. 
We construct a nested pair of o -algebras 6; C B; on V; foreach 1 <i < d and an 
integer K > 1 by the following double-loop algorithm. 


Step 0. Initialize 5; = {@, V;} for each i. 

Step 1. Let K be the smallest integer such that each of the 6; have at most K 
atoms. Set B; := B; for each i; thus we trivially have 

E(B, 8- @ BY) < EB g- @ By) +o’. 


e Step 2. If 
|Ex ev es xacVa LA xx Ag (f E E(f |B) ® se ® Bx, ERES xa)| < F(o, K) 
for all Ay C Vi,..., Aa C Va, then we terminate the algorithm. If not, then we 


can apply Lemma 11.31 to obtain for each 1 < i < d a new o-algebra B; with 
at most twice as many atoms as $; such that 


74 74 / 1 1 
EBD -e 1) = E618- 8 B) + Fo, Ky 


Step 3. If we have 
EBI ®--- @ BY) < EBD Ba) +o? 
then we set B’ := 6” and return to Step 2. If instead we have 
E(B ®--- Q By) > EB @--- @ Ba) +0” 
then we set B = B” and return to Step 1. 


Once the algorithm terminates we set fy := E(f |B] 8 --- 8 B3), fe := 
E(f |B; 8 --- ® Bz), and fy := f — fy+. The verification that the algorithm does 
indeed terminate in finite time and gives the desired properties is almost identical 
to the analogous arguments in Proposition 10.36 and is left to the reader as an 
exercise. 














Remark 11.32 A closer inspection of the above argument shows that the number 
of atoms K in the o-algebra B can increase from K to as much as K270? 
whenever we return from Step 3 to Step 1. Since the latter step can occur as 
often as 1/o* times, we see that the final complexity will most likely be a tower 
exponential in K or worse (unless we restrict F to have logarithmic growth or so, 
see [239] for some discussion of this type of lemma). As mentioned in Section 
10.6, this tower-exponential behavior is unavoidable, see [136]. 


Now we discuss the extension of the regularity lemma to hypergraphs. To 
simplify the exposition we shall consider only the 3-uniform case. First it turns out 
that a minor modification of the proof of Proposition 11.29 yields the following 


11.6 The hypergraph approach 461 


variant, in which we obtain much stronger uniformity control (with respect to 
arbitrary 2-edge sets rather than vertex sets), but with a weaker and more complex 
notion of K-constancy. More precisely, let us say that a function f : Vj x V2 x 
V3 — R* is (K, 2)-constant if there exist partitions V; x V; = Eija U+, Eijk 
for ij = 12, 23,31 such that f is constant on each set 


{(x1, X2, x3) E€ Vi x V> x V3 (xj, xj) E€ Eija; for all ij = 12, 23; 31} 
for all a12, a23, 431 € [1, K]. 


Proposition 11.33 (Preliminary hypergraph regularity lemma) /360] Let 
V,,..., V3 be arbitrary finite non-empty sets, let f : Vi x --- x V3 > Rt be 
such that 0 < f(x1, x2, x3) < 1, let o > 0, and let F : Rt x R* > R*+ be an 
arbitrary function. Then there exists a quantity K = Og,r(1) and a decomposition 
f = fu + fu with the following properties: 


e the “anti-uniform” component fy. obeys the bounds 0 < fy < 1. 
Furthermore there exists a (K , 2)-constant approximation fc to fy+ with 


0 < fe < land | fu+ — felleaxvxva < T; 
e the “uniform” component fy obeys the regularity estimate 


[Ex eVi, x26€V2,x3€V3 lap (X1, x2)l 43 (22, x3)l 4, (x3, x1) fux, sey xa)l < F(o, K) 


forall An C Vi x V2, A23 C V2 x V3, Asi C V3 x Vi. 


The proof of Proposition 11.33 is almost identical to that of Proposition 11.29 
but with a somewhat heavier notational burden. We leave it as an exercise. 

One can then deduce a full-strength regularity lemma for 3-uniform hyper- 
graphs. The exact formulation of this lemma is rather messy and too unenlighten- 
ing to be given here (see [139], [283], [284], [282], [360]), but we will describe 
the formulation indirectly by informally outlining the proof of the lemma, follow- 
ing [360]. Given a function f : V; x V2 x V3 > R*, an initial error tolerance o, 
and a growth function F : Rt x Rt —> R*+, we then decide upon a much faster 
growth function F®st : Rt x Rt — R+, the exact choice of which will be chosen 
later. Applying Proposition 11.33 with this much faster growth function Ft we 
obtain a primary decomposition f = fy. + fy, where fy is extremely regular 
with respect to 2-edge partitions (enjoying the fast function F®' in the denomina- 
tor), and fy. is approximable by a (K, 2)-constant function fc, where K has some 
(rather lousy) upper bound. The K -constant function can be described using O(K ) 
edge sets Eija in V; x V; for ij = 12, 23, 31. We then apply Proposition 11.29 
to the indicators 1 Eja | Vi X Vi > R* each of these edge sets E; ja, using the 
original function F, and replacing the error tolerance o by something smaller, e.g. 
1/F(o, K). Strictly speaking, we need a “multiple function” or “vector-valued” 


462 11 Szemerédi’s theorem for k > 3 


version of Proposition 11.29 in which we regularize multiple functions simulta- 
neously using a single partition, but this is not hard to set up. This gives us a new 
parameter K’ := Ox, r,(1), such that we have secondary decompositions of each 
of the indicator functions 1 g,., into a K’-constant main term and some manageable 
errors. Finally, we choose F®*' so that F®t(o, K) dominates any expression that 


ija 


will arise from K’; this basically means that F™*' is a tower-iterated version of 
F, and will ensure that the error term fy in the primary decomposition is also 
manageable. 

Thus to summarize (and glossing over the delicate issues regarding the relative 
sizes of various parameters), we start with a function f(x,, x2, x3) of three vari- 
ables, and approximate it by a combination of O(K) functions 1;,, ,(x;, xj) of two 
variables, plus manageable errors; we then approximate each of the 1g, (xj, xj) 
by a combination of O(K’) functions of one variable (i.e. the indicators of the ver- 
tex classes), again plus manageable errors. With carefully chosen relative sizes of 
parameters as given above, this regularization of the original function f is suitable 
for such tasks as accurately counting the number of 3-simplices in a 3-uniform 
hypergraph, in a manner similar in spirit to (but somewhat lengthier than) the proof 
of Lemma 10.46. This in turn eventually leads to a proof of Theorem 11.27, which 
in turn implies Szemerédi’s theorem and a number of other consequences. 


Exercises 


11.6.1 Deduce Proposition 11.28 from Lemma 10.46. (Hint: the vertex set V 
for the k-uniform hypergraph should consist of coordinate hyperplanes 
such as {(x1,..., Xx) : x; = const}, as well as the diagonal hyperplanes 
{(x1,..., Xk) : X1 +--+ + xg = const}.) 

11.6.2 Use Proposition 11.28 to deduce Theorem 11.24. 

11.6.3 Use Proposition 11.28 to deduce the claim r4(Z) = 0|Zļ—-o0;k(| Z|) when- 
ever |Z| is coprime to (k — 1)!. 

11.6.4 [139] Let V, W be disjoint sets of n vertices each. Let us color the vertices 
in V red or blue randomly and independently, with equal probability of 
each. Suppose we also color the edges in W (i.e. the elements of KOD 
red or blue randomly and independently. Let H = H(V U W, E) be the 
3-uniform hypergraph with edge set consisting of all triples {v, w, w’}, 
where v € V and {w, w’} € (CF) have the same color, together with all 
triples of the form {v, v’, w} with {v, v’} € C) and w € w. Let us also 
define the competing 3-uniform hypergraph H’ = H'(V U W, E’), where 
E’ consists of all the triples of the form {v, v’, w} with {v, v’} € C) 
and w € W, and with each triple of the form {v, w, w’} with v € V and 
{w, w’ }i n(%) belonging to E’ with independent probability 1/2. Show 


11.7 Arithmetic progressions in the primes 463 


that with large probability, the number of 3-edges joining any three large 

subsets A, B, C of V U W is about the same for H and H’, but that H and 

H’ have very different numbers of 3-simplices. (Of course, one should 

quantify these vague statements precisely, for instance using Chernoff’s 

inequality.) This shows that regularization based entirely on vertex parti- 

tion will not be sufficient to easily conclude the simplex removal lemma. 
11.6.5 Prove Proposition 11.33. 


11.7 Arithmetic progressions in the primes 


We now discuss the Green—Tao theorem, Theorem 10.7. We will not give a com- 
plete proof of this theorem here, referring the reader to the original paper [158] 
and to the survey articles [358], [217], [184], [153], [361] for further details. 
Instead we shall give a somewhat informal discussion, in particular focusing on 
the connections with the other arguments discussed in this chapter. 

We begin by a very brief history of the problem. This result has been conjec- 
tured for some time; indeed, long progressions of primes were already studied 
by Lagrange and Waring in 1770. The Erdés—Turan conjecture (Conjecture 10.6), 
formulated in 1936, was certainly motivated in part by this problem; it implies The- 
orem 10.7 but is much stronger (and still open). The first significant progress on the 
problem was in 1939, when Van der Corput [370] used Fourier-analytic methods 
(but not the density increment or energy increment arguments) to establish that the 
primes contained infinitely many progressions of length three. A key step of the 
argument is to obtain good bounds for exponential sums such as Ej <,<y A(n)e(an), 
where A is the von Mangoldt function and œ is a real number (which may be 
close to a rational with small denominator, or far away from one). However, as 
discussed earlier, Fourier methods (also known as the Hardy-Littlewood circle 
method in analytic number theory) do not directly work for progressions of length 
4 or higher. Progress on this problem thus became very slow. Szemerédi’s theorem 
did not directly give any new results on the primes, as they had density zero, and 
even the powerful quantitative bounds of Bourgain (Theorem 10.30) for k = 3 and 
Gowers (11.23) were insufficient to attack the primes (which would require a 
bound roughly of the form 7;,(Zy) = o(N log log N/ log N)). 

Meanwhile, the methods of sieve theory were developed by analytic number 
theorists, in part to solve questions concerning the existence of patterns of primes 
such as arithmetic progressions. While these methods seem unable by themselves 
to count primes directly (due to the notorious parity problem in sieve theory, the 
discussion of which is beyond the scope of this book), they have proven to be 
enormously successful in counting almost-primes — products of very few primes. 


464 11 Szemerédi’s theorem for k > 3 


For instance, it is not too hard to use sieve theory methods to show that for any 
given k, there are infinitely many progressions of length k, the elements of which 
are each the product of O;,(1) prime factors. However to pass from the almost- 
primes to the primes remained difficult; one notable result is that of Heath-Brown 
[179] in 1981, who showed that there were infinitely many progressions of length 
4 where three elements were prime and the fourth was the product of at most 
two primes. In another direction, Balog [15] in 1992 was able to find infinitely 
many k-tuples of primes p1, ..., py whose midpoints (p; + p;)/2 were also prime. 
Meanwhile, in 1996, Kohayakawa, Luczak, and Rödl [212] extended the Szemerédi 
regularity lemma to subgraphs of a certain type of random subgraph, and in so 
doing extended Roth’s theorem to show that relatively dense subsets of a random 
set contained many progressions of length 3 (see Theorem 10.18). More recently, 
Green [147] used Fourier methods to obtain a Roth theorem for the primes, in other 
words showing that any subset of the primes of positive relative density contained 
infinitely many arithmetic progressions of length 3. This was then refined by Green 
and Tao [159], who showed (roughly speaking) that any dense subset of a set 
which was well controlled by a sieve would contain infinitely many progressions 
of length 3. 

In [158] this type of result was extended to arbitrary k. The precise statement 
requires some notation. 


Definition 11.34 (Pseudo-random measure) [158] A function v : Zy —> R* is 
said to be k-pseudo-random if we have Ez, v = 1 + on-,o0(1), and more generally 
we have the linear forms condition 


m t 
Ex, ,...x;€Zy I] v (> Lijxj + n) = l +0n>œk(l) 


i=1 j=l 


whenever 0 < m < k2%=!,t < 3k —4,andb,..., bm € Zy are arbitrary, and L;; 
are rational numbers with numerator and denominator of magnitude at most k, such 
that none of the m t-tuples (Li; yaa are rational multiples of any other. Furthermore 
we assume the correlation condition 


m 


Exezy | [vœ +h) < D> tlhi- hj) 


i=1 l<i<j<m 


for all 1 < m < 2%! and all hy,..., hm € Zy, where t : Zy — Rt is a function 
obeying the moment conditions Et? = O4,(1) forall 1 < q < mw. 


The above definition is rather complicated, but one should view these condi- 


tions as an assertion that the weight function (or “measure”) v is very randomly 


distributed. If we have v = P! A for some set A C Zy, these conditions are 


11.7 Arithmetic progressions in the primes 465 


essentially asserting that the events `; ; Lijxj; + bi € A are essentially indepen- 
dent of each other if the (Li; Viet are not commensurate, and the events x + h; € A 
are only mildly correlated to each other for generic choices of h1, ..., Am. 

The key result in [158] then takes the Szemerédi theorem, in the form of The- 
orem 11.1, and generalizes it to pseudo-random measures. 


Theorem 11.35 (Relative Szemerédi theorem) Let k > 3, let Zy be a finite 
cyclic group of large prime order N, and let f : Z > R* is a non-negative func- 
tion which is not identically zero, and obeys the bounds 0 < f(x) < v(x) and 
Ez, (f) = ô > Ofor all x € Zy and some k-pseudo-random measure v, then 


Alf, -3 F) = Qk (1) — ON>oo;k,8 (1). 


This strengthening of Szemerédi’s theorem allows one to detect arithmetic 
progressions not just in sets of positive density, but now also in sets of positive 
relative density with respect to sufficiently “pseudo-random” sets, even if the latter 
sets have density zero. For instance, given any set B C Zy for which Pr lg is 
k-pseudo-random, the above theorem will guarantee that r(B) = on-+co:x(|B|), 
provided one has a mild condition such as P(B) > N~'/* in order to neglect the 
diagonal r = 0 term in A;(f,..., f). In particular, any subset A of B of large 
relative density |A|/|B| > ô will contain a proper arithmetic progression of length 
k as soon as N is sufficiently large depending on 6 and k. 

As it turns out, the primes P do not quite fall into the above framework, 
because they are unevenly distributed with respect to small residue classes (e.g. 
they are almost all odd), and any set B containing P for which P has positive 
relative density will also necessarily have some uneven distribution in small 
residue classes (this is ultimately due to the divergence of the Euler product 
JI aul I). On the other hand, pseudo-random measures are necessarily evenly 
distributed among such classes (see exercises). However, this can be easily fixed, 
by the simple trick of using the pigeonhole principle to pass to a single residue 
class among small divisors. More precisely, one defines W := |] p<w P for some 
small w (e.g. w = loglog N will suffice), and replaces the primes P by the set 
Pwow = {q € leeN, 2eN]: Wp +b € P} for some b coprime to W (in fact 
one can use Dirichlet’s theorem on distribution of primes in residue classes to 
take b = 1). Here ge, := 1/ 2*(k + 4)! is a small number needed for some minor 
technical reasons (related to the denominators of the L;; in the k-pseudo-random 
condition). See [158], [361] for more details of this “W-trick”’. 

It turns out that Pw p y can be contained effectively in a k-pseudo-random 
measure. More precisely, there exists a k-pseudo-random measure v : Zy > 
R+ such that Ez,1p,,,v = ©x(1), and also one has the mild upper bound 
lvez) = O(N'/*) (again needed to order to neglect the r = 0 diagonal term). 


466 11 Szemerédi’s theorem for k > 3 


This fact, combined with Theorem 11.1, is enough to establish arithmetic pro- 
gressions of length k in the primes, and even to establish the stronger result that 
re(P N[1, N]) = ons c0%(JP OG, N]|) = On-+00:x(N/ log N). The construction 
of this measure relies on a version of the Selberg sieve used by Goldston and 
Yildirim [134], [132], [133] (see also [363], [184], [361]); it is purely number- 
theoretical in nature and we do not reproduce it here. However, we do remark that 
v can be thought of as being a (smoothed out) version of the normalized indi- 
cator function on the almost-primes P, = {n : n is the product of O;(1) primes}, 
or more precisely of the portion of P; in the residue class b (mod W). As men- 
tioned earlier, modern sieve theory techniques such as the Selberg sieve are very 
accurate at counting correlations of almost-primes, and thus can verify the k- 
pseudo-randomness of v by fairly standard arguments. In contrast, verifying the 
k-pseudo-randomness of a normalized counting function of the primes themselves 
(or of a related object such as Pw p,n) is still beyond the reach of current tech- 
nology, being roughly equivalent to the notorious Hardy—Littlewood prime tuples 
conjecture, which would imply not just the Green—Tao theorem but also the twin 
prime conjecture, Goldbach’s conjecture, and many other difficult and unsolved 
problems in additive number theory. Thus one crucially needs a tool such as the 
relative Szemerédi theorem to bridge the gap between the almost-primes (which 
we understand quite well) and the primes (which are still very mysterious). 

We briefly discuss the proof of Theorem 11.35. It turns out that this theorem 
is proven by a means very similar to that to the proof of Szemerédi’s theorem 
outlined in Section 11.4, but now the functions involved are not bounded by 1, 
but are instead bounded by some k-pseudo-random measure v. Nevertheless, it is 
still possible to adapt most of the arguments in that section (with the exception 
of the useful UAP*~* norms, which do not seem to have a suitable analog in 
this setting). First of all one can generalize the generalized von Neumann theorem 
(11.8) to obtain the bound 


|Ak( fo, -- -> fk-1)| = Ok (arin, fll) + On-00;k(1) (11.34) 


whenever fo,..., fk-1 : Zy —> R* are bounded in magnitude by v + 1. The orig- 
inal bound (11.8) was proven using multiple applications of the van der Corput 
lemma, which in turn is essentially just the Cauchy—Schwartz inequality; similarly, 
the bound (11.34) is also proven using several applications of the Cauchy—Schwarz 
inequality, the main task being to keep track of all the weights involving v and to 
use the linear forms condition to ensure that after a certain point these weights can 
be replaced by 1 with only a negligible error. See [158] for full details. 

The bound (11.34) tells us that even in the pseudo-random setting, functions 
which are Gowers uniform of order k — 2 can still be safely ignored. This opens 


11.7 Arithmetic progressions in the primes 467 


the way to prove Theorem 11.35 by using a Koopman—von Neumann theorem. 
Here, the relevant theorem is as follows. 


Proposition 11.36 (Generalized Koopman—von Neumann structure theorem) 
[158] Let v be a k-pseudo-random measure, and let f : Zy — R* be such that 
0 < f(x) < v@) forall x € Zy. Let0 < £ « 1 be a small parameter, and 
assume N > No(s) is sufficiently large. Then there exists a o -algebra B and an 
exceptional set Q € B such that: 


e (smallness condition) 


E(vlg) = ON->00:e,k(1)3 (11.35) 
e (v is uniformly distributed outside of 9) 
IA — 1Q)E@ — 1B) Ile~@yy = Onovie,k (1); (11.36) 


and 
e (Gowers uniformity estimate) 


IA — lef — EF IBYllugy) < el”. (11.37) 


Assuming this proposition, one can now write (1 — le) f = fu + fy+, where 
fu := (1 — 12)(f — E(f |B)) is Gowers uniform of order k — 2, and fyi := 
(1 — 1g)E(f|8) is bounded by 1 + ow_s00:¢,4(1) (since E(f|B) < 1 + EO — 1|B)) 
and non-negative. Furthermore by using (11.35) one can show that fy- almost has 
the same mean as f: Ez, fyi = Ez, f — ON>oo;e,k(1). From the latter two facts 
one can use the ordinary Szemerédi theorem (Theorem 11.1) to establish that 


Akl fut, -> fut) = Qk, (1) — On-+00:k,5C)- 
Since fy is Gowers uniform, we can easily use (11.34) to then conclude 


Az( fu + fu, ..., fur + fu) = Q81) — 0N>o;k,(1) 


and Theorem 11.35 then follows since 0 < fyi+ fu < f. 

It thus only remains to prove Proposition 11.36. Here we follow the energy 
increment strategy already used to prove Propositions 10.36, 11.18, and 11.29. 
The first step is the following generalization of Lemma 11.14: 


Lemma 11.37 (Soft inverse theorem) /158] Let f : Zy —> C be a function 
bounded in magnitude by v + 1, and let F = Dy_\(f) be the dual function. 
Then ||Fllrægzy) < 2”! + ON-scox(1). Furthermore, if fllu) Z n, then 
Kf FS. 


The key feature here is that even though f may be unbounded (or at least very 
large), the dual function F is bounded quite concretely. This is a consequence of 


468 11 Szemerédi’s theorem for k > 3 


the linear forms condition, which among other things provides a uniform bound 
for Dg_1(v + 1) and hence for Dy_1(/). 

One can then run the same energy increment algorithm used in Propositions 
10.36, 11.18, 11.29, to convert any lack of uniformity in the fy term into a dual 
function which is then added to a o-algebra in order to increase the energy of the 
fu- term. The only difficulty with executing this strategy is to ensure that fy 
stays bounded. This is accomplished by the following somewhat technical result. 


Proposition 11.38 [158] Let v be a k-pseudo-random measure. Let 0 < € < 1 
and0 < n < 1/2 be parameters. Then to every function F : Zy — R bounded in 
magnitude by v + 1, one can construct ao -algebra B,,,(Dy_1 F) with the following 
property: for any K > 1 and any F\,..., Fg : Zy — R functions bounded in 
magnitude by v + 1, if we set B := Be „(Dr-1 F1) V +++ V Be,n(De-1 Fx), then if 
n < nole, K) is sufficiently small and N > No(e, K, n) is sufficiently large we 
have 


Dk- F; — E(D- Fj |B) \|t~zy) < € for alll < j< K. (11.38) 
Furthermore there exists a set Q which lies in B such that 
Ezy (© + Dlo) = Ox,e(n'”) (11.39) 
and such that 
1 = 1a)EO = 1B) I~ = Ox.e(n"””). (11.40) 


The o-algebras B, „(D-1 F) are constructed very similarly to those in Propo- 
sition 10.38, the only real difference being that certain small atoms cause some 
difficulty and need to be placed in the exceptional set Q. However these problems 
can be dealt with by taking ņ suitably small depending on K, £, and then N suit- 
ably large depending on K, £, 7. The trickiest task is to establish (11.40). This 
ultimately comes down (using the Weierstrass approximation theorem as in the 
proof of Proposition 10.38) to establishing estimates of the form 


E((v — 1)Dy-1 F1 - +» De-1 Fg) = On 00%, KC) 


whenever F}, ..., Fx : Zy — R are functions bounded in magnitude by v + 1. 
This estimate turns out to be achievable by application of the Gowers—Cauchy-— 
Schwarz inequality, Hélder’s inequality, and both the linear forms and correlation 
conditions; see [158]. 

Finally, we apply the energy increment argument and combine Lemma 11.37 
and Proposition 11.38 as in the proof of Proposition 10.36 to obtain Proposition 
11.36. Actually the energy increment argument here is slightly simpler than that in 
Proposition 10.36 as there is no arbitrary growth function F to deal with. As such 


11.7 Arithmetic progressions in the primes 469 


one can use just a single loop iterative procedure rather than a double loop, which 
simplifies things slightly. On the other hand, the presence of the exceptional sets, 
and the unboundedness of several of the functions being manipulated, requires 
some additional care, in particular to ensure that one really does get a substantial 


energy increment at each stage in order to make the algorithm terminate in finite 
time (and to keep the quantity K appearing in Proposition 11.38 bounded by 


0.(1)). 


Exercises 


11.7.1 


11.7.2 


11.7.3 


11.7.4 


Suppose that one knew that r;,(Zy) = On-+00:¢(N log log N/ log N) for 
all k > 3. Derive the Green—Tao theorem as a consequence of this. (Hint: 
divide the primes from 1 to N into residue classes mod P = [| paion T 
for some small absolute constant c, and use the pigeonhole principle (and 
Proposition 1.51) to conclude that the primes in one of these classes has 
density roughly log log N/ log N.) 

Use Theorem 11.35 to prove a version of Theorem 10.18 for large 
cyclic groups Zy and arbitrary k. (Hint: if B is a random subset of Zy 
with expected density t > N~* for some small ¢ = ex, > 0, show using 
Chernoff’s inequality that =a g is very likely to be k-pseudo-random.) 
[158] Let v:Zy—-R_ be k-pseudo-random. Show that ||v— 
Il u*-1(z,y) = OnN->00,(1). Conclude in particular that if k > 3, then one 
has the uniform distribution property 


Eezy 1 p(x)vQ%) = Pzy(P) + on oox(1) 


for any arithmetic progression P. Thus pseudo-random measures must 
be evenly distributed in arithmetic progressions. 
[158] Prove Lemma 11.37. 


12 





Long arithmetic progressions in sum sets 


12.1 Introduction 


One general theme throughout this book is that sum sets A + B are more struc- 
tured than arbitrary sets A, B, and in particular that iterated sum sets such as 
LA = {aj +---+ a: a; E€ A;} should get increasingly structured as / gets larger. 
One example of this phenomenon is Lemma 4.13, which shows that if A has small 
Fourier bias then /A quickly fills out the entire ambient group. (See also Exer- 
cise 4.3.12 for related demonstration of special structure of sum sets.) For another 
example, let A be a subset of [1, n] for some large n and consider (as a measure 
of structure) the longest progression contained inside A. If A has no structure 
other than density, e.g. |A| > 0.99n, then there is not much we can say. Even the 
powerful quantitative version (11.23) of Szemerédi’s theorem due to Gowers can 
only obtain an arithmetic progression of length Q(log log log log log n). For cubes 
the situation is somewhat better (and simpler); Lemma 10.49 guarantees that A 
contains a proper cube of dimension Q2(log log n), though this is still far from the 
maximal dimension O(log n) of the cubes inside [1, n]. 

The situation improves markedly with taking sum sets, though. First, if A C 
[1, n] has cardinality at least 0.99n, then it is easy to see that A + A or A — A con- 
tains an arithmetic progression of length 0.98n. This is of course a rather extreme 
case, but more generally if A C [1, n] is such that |A| > ôn, then Bourgain’s theo- 
rem (Theorem 4.47) shows that A + A and A — A contain proper arithmetic pro- 
gressions of length at least exp(Q;(log!/? n)). For 3A and 2A — A, Exercise 4.7.1 
shows that these sets in fact contain proper arithmetic progressions of length 
Q3(n), while Theorem 4.43 shows that these sets contain proper generalized 
arithmetic progressions of rank O3(1) and volume © 3(n). For 2A — 2A, Chang’s 
theorem (Theorem 4.42) gives similar results but with better dependence on 6. 

These results however require A to be rather dense inside the ambient interval 


[1, n]; even Chang’s theorem requires A to have density aegen) in order to be 


470 


12.1 Introduction 471 


non-trivial. If A is sparser than this, one can still ask what happens to sum sets 
such as LA when / gets large. One can show fairly easily (see exercises) that for 
fixed A C Zand! very large, LA essentially coalesces into a single long arithmetic 
progression, plus some negligible terms at the boundary. For other groups, the 
situation is slightly different (again, see exercises); note that unlike the situation 
with small /, Freiman homomorphisms are much rarer when / gets very large and 
one cannot identify the asymptotic behavior of LA as l — oo for non-isomorphic 
ambient groups. See [260], [261] for some more advanced results in this direction. 

Now we ask a more quantitative question. Suppose we are given positive integers 
l,m,n with 2 < m <n (the case m = | will be too degenerate to consider). If A 
is an arbitrary subset of [1, n] with cardinality |A| = m, what can we say about 
the structure of LA, and more precisely, what is the largest arithmetic progression 
(or generalized arithmetic progression) one can find inside A? From the above 
discussion we expect to find quite a large progression when J is large. For instance, 
from the work of Lev [226] one has the following result: 


Theorem 12.1 /226] Let A C [1, n] be such that |A| > 2 and A is not contained 
in any progression of step greater than 1. Letl be such thatl > 2(n — 1)/(\A| — 2). 
Then LA contains an interval [m + 1, m + n] for some integer m. 


In fact more precise statements are available; see [227]. An earlier result of 
Sárközy [307] established the following weaker result: 


Theorem 12.2 [226] There exists an absolute constant C > 0 such that the fol- 
lowing holds. Let A C [1,n] andl > 1 be such that |A| > 2 and1\|A| > Cn. Then 
LA contains a proper arithmetic progression of length QI A|). 


We shall prove this theorem as a special case of more general results below. 

We can phrase the above theorems in a different way. For any /,m,n with 
2 <m <n,wedefine f(m, l, n) to be the largest integer such that, for every subset 
A C [1, n] of cardinality |A| > m, LA contains a proper arithmetic progression of 
length f(m,1,n). The question is now to determine the size of f(m,/,n) for 
various values of m, l, n. Theorem 12.2 asserts that f(m, 1, n) = Q(/m) whenever 
lm > Cn. In fact we have f(m,1,n) = ©(/m) in this case, as can be seen by 
considering the set A = [1, m]. 

This gives a satisfactory answer to the question when m is large compared to 
n/L. Itis now natural to ask whether this threshold n/ l is sharp, and what happens 
to f(m,l,n) for m below this threshold. It turns out in this case that the upper 
bound for f(m, l, n) drops dramatically: 


Lemma 12.3 (Upper bound on f) Let d > 1 be an integer, and let 1,m,n be 
positive integers such thatl > 2 andn > 2414-1m4, Then fim’, I,n)<Im—I+1. 


472 12 Long arithmetic progressions in sum sets 


Proof Let A be the rank d progression 

A := [1, m] - (1, 2lm,..., Qlm)*"), 
thus 

1A = [l, Im}? - (1, 2lm, ..., Qlm)*"), 


Then by summing the geometric series we see that A C [1, n]. From the base 2/m 
representation of the integers we see that the map @ : [/, Im]! — IA defined by 
P(X0,---,Xd-1) = pa, Xj (2lm) is a Freiman homomorphism of order 2. The 
same argument shows that A is proper, so that |A| = mt. From Proposition 5.24 
we thus see that the length of the longest arithmetic progression in LA is the same 
as the length of the longest arithmetic progression in [/, /m]“, which is clearly 
Im — l + 1. The claim follows. 














From this lemma (and the trivial observation that f(m,/,n) is monotone 


increasing in m) we see that there exist constants cg for d = 1, 2,3, ... such that 
f(m,l,n) = O(im!'/4) wheneverm < cq mT- Thus the upper bounds for f (m, /, n) 
exhibit a thresholding behavior in m near the points n/1,n/I?,n/1°, .... Somewhat 


remarkably, these thresholds are sharp up to constants: 


Theorem 12.4 [350] Let d > 1. Then there exists a constant C4 > 0 such that 
for any l > 1 and A C [1, n] with |A| > Cajz and |A| > 2, LA contains a proper 
arithmetic progression of length Q4 (l| A|" ®). 


Note that Theorem 12.2 already gives the d = 1 case of this theorem. Combining 
this with the preceding discussion, we see that f(m, l, n) = Og (Im'/¢) whenever 
Caja <m < cama. This settles the question of determining the magnitude of 
f(m, l, n) as long as m is well away from the thresholds n/1, n/1’, etc. and is not 
too small. The precise behavior near these thresholds is still unclear, and may be 
difficult to resolve. 

We will prove Theorem 12.4 (and hence Theorem 12.2) in the following sec- 
tions. A key observation is that up to constants, one only needs to consider the 
case when / is a power of 2, in which case one can view /A as an iteration of the 
doubling operation A > A + A. This gives the problem a certain dynamic flavor, 
in which we analyze the evolution of a set under the doubling map. We then discuss 
extensions and variants, in particular to restricted sum sets 


I*A := fay +- +a: a, ...,a € A, distinct} 


and finite sum sets 


1=0 xeB 


FS(A) := |]J) = 40x: BCA,0< |B <o}. 


12.2 Proof of Theorem 12.4 473 


where we now allow A to possibly be infinite. This in particular will be used to 


resolve some conjectures of Erdős and Folkman on complete sequences; we also 


present some other applications. 


Exercises 


12.1.1 


12.1.2 


12.1.3 


12.1.4 


12.1.5 


12.1.6 
12.1.7 


Let A be an additive set in a cyclic group Z, of prime order. Show 
that /A = Z, whenever /(|A| — 1) > p — 1, and that this condition is 
best possible. (Hint: use the Cauchy—Davenport inequality, Theorem 5.4.) 
Thus when |A| > 2, the iterated sum sets stabilize to the entire group after 
at most (p — 1)/({A| — 1) summations. 

Let A be an additive set in a finite additive group Z. Show that there 
exists a subgroup G of Z such that /A is a coset of G for all sufficiently 
large l. If A contains 0, show that we in fact have LA = (A) whenever 
1|A| > 2|(A)|, where (A) is the group generated by A. (Hint: quotient 
out by the symmetry group Sym, (LA) and then apply Kneser’s theorem, 
Theorem 5.5.) Thus in this case the iterated sum sets stabilize to a group 
after at most 2|Z|/|A| summations. 

Let A is an additive set of integers. Show that if / is sufficiently large 
depending on A, then ŻA is a proper arithmetic progression of length © 4(/) 
together with at most O,(1) additional elements. (Hint: this statement is 
only non-trivial for very large l. Use the Chinese remainder theorem. It 
may be useful to reduce to the case when A has smallest element zero, 
and has no common divisor.) 

Prove Theorem 12.2 in the case when / is extremely large compared to A 
and n. 

Let A be an additive set in Z? that contains the origin 0. Let B be the 
convex hull of A in Rf, and let T be the sub-lattice of Z? spanned by A. 
Show that for all large / we have 


(A — oro;4(1))- B) OAT SIA S (A + oio;4(1)): B)AT. 


How is this statement modified when A does not contain the origin? 
Show that f(m,/,n) < Im — l + 1 forall] > 1 and2 <m <n. 

Show that f(m, l, n) = l whenever n > (21)"—!. (Hint: for the upper 
bound, consider A = 2/[0, m — 1] = {1, 21, (21)*,..., 21)"""},) 


12.2 Proof of Theorem 12.4 


To prove Theorem 12.4 it turns out to be convenient to prove a stronger result. 


Observe in the example given in Lemma 12.3 not only contains an arithmetic 


474 12 Long arithmetic progressions in sum sets 


progression, it in fact contains a much larger generalized arithmetic progression. 
This phenomenon turns out to be quite general: 


Theorem 12.5 [350] For any fixed positive integer d there is a constants Cg > 0 
such that the following holds. For any positive integers n and l and any set A C 
[1, n] satisfying I4|A| > Can, LA contains a proper progression of rank d' and 
volume at least a(l | Al), for some integer 1 < d' < d. 


It is easy to see that this implies Theorem 12.4, and we leave this as an exercise. 
The example in Lemma 12.3 shows that one can have d’ = d; the simple example 
A = [1, m] also shows that one can have d’ = 1. Of course intermediate values of 
d' are also possible. 

We now prove Theorem 12.5. We begin with a version of this theorem for 
progressions. 


Lemma 12.6 (Coalescence of progressions) Let P be a proper progression of 
integers of rank at most d, and let l > 1 be an integer. Then IP contains a proper 
progression of rank d' and volume at least Qq(I" |P|) for some 1 < d' < d. 


Remark 12.7 It is instructive to experiment with the sum sets /P for a 
proper progression P of rank d as | — oo, e.g. the progression P = [1, m]* - 
(1, Ny, Ni N2, Ni N2N3) where m < Ni < No < N3. At first, the sum sets /P will 
remain proper of rank d (and grow polynomially in size, like /“). But at some 
point there will be a “collision”, causing the sum set to essentially “coalesce” into 
a progression of rank d — 1 or less (and thus grow somewhat more slowly in Z). 
After a finite number of collisions, the sum set will coalesce into a single arithmetic 
progression (plus negligible terms), at which point it only grows linearly in /. The 
proof of Lemma 12.6 below can be used to formalize this intuitive picture but we 
will not do so here as the notation required is somewhat complicated. This result 
is also closely related to Minkowski’s second theorem (Theorem 3.30), as well as 
the other machinery in Section 3.5. 


Proof We shall induce on d. The case d = | is obvious (indeed, P is now an 
arithmetic progression of length /|P| — l + 1), so suppose d > 1 and the claim 
has already been proven for d — 1. We may assume that / is large depending on d 
since the claim is trivial otherwise (since / P contains a translate of P). 

Let Cq > 1 be a large constant to be chosen later. Now let k > 1 be the 
largest integer such that 2% < 1/C4. If 2% P is proper, then |2% P| = Q4(2*4| P|) = 
Qa (lt P| / G2) and the claim follows with d’ = d since / P contains a translate of 
2K P. Now suppose that 2* P is not proper. Let 1 < k’ < k be the first integer such 
that 2 P is improper, then by arguing as before we see that |2" P| > |2"-! P| = 
Q4(2*4| P|). Applying Theorem 3.40 we see that Oq(1)2" P contains a proper 


12.2 Proof of Theorem 12.4 475 


progression Q of rank d — 1 and volume Q,(|2" P|) = Qq(2*“| P|). Applying the 
induction hypothesis we see that 2‘~* Q contains a proper progression of rank d’ 
and volume at least 


Qa (20 2E Pl) = Qa (2O P|) = UIP = Dall IPIC) 


for some 1 < d’ < d — 1. If C4 is large enough, then /P will contain a translate 
of 2 Og(1)2* P and hence a translate of 2*~* Q. The claim follows. 














The above lemma allows us to split the problem of finding a progression of 
small rank in ZA into two parts. First, we find a progression P of large rank in l'A 
for some l’ < L, and then use the above lemma to find a progression of small rank 
in kP, where kl’ < l. More precisely, we have 


Proof of Theorem 12.5 Note from the hypotheses /¢|A| > Can and A C [1, n] 
that we can ensure that / is large depending on d, simply by choosing Cy large 
depending on d. 

Let k be the largest integer such that 2% < /. Thus 


2*1 Al > 11A] /2f > Can/2%. 


On the other hand we have 2% A C [1, 2%n] and hence 
2d 
|2ŽA] < 2n < 2 +D] A]. 
Ca 


Now let k’ > 1 be the smallest positive integer such that 
|2% Al < 2K 4+3/2] A| 
then for Cy large enough we have k’ < k, and in fact 
k' < k — Qa(log Ca). 
Set A’ := 2¥—! A. By construction of k’, we have 
[ae ha’ > Q&—DE+3/2)) A] 
and hence 


By Exercise 2.3.14, we can find a subset F C A’ which is symmetric around some 
point x/2 such that |F| = ©4(|A'|) with doubling constant O4(1). Applying the 
Ruzsa—Chang theorem (Theorem 5.30), we see that 2F — 2F contains a proper 
progression of rank O4(1) and volume Qy(|A‘|). By the symmetry of F, we see 
that 2F — 2F is a translate of 4F, which is contained in 4A’. Thus 4A’ = 2¥+! A 
contains a proper progression Q of rank O4(1) and volume Q4q(|A’|). Now LA 


476 12 Long arithmetic progressions in sum sets 


contains a translate of 2A, which in turn contains 2‘~"'-!Q. Applying Lemma 
12.6 we conclude that /A contains a proper progression P of rank d’ and volume 


|P| = Qu (=O 9) = u (LOA) = D DK E4372) 4) 


for some d’ = Og(1). On the other hand, since LA C [1, In], we have 
gat 
Al < In eo n= 0 al, 
Ca 
Since | P| < |LA, we conclude that 
QE EE EEES ON A| < Ou Fyne) A] 
= C; 
and thus 


gtk—-k’y(d'—d—-1) < Ou Í awn ; 
= C, 
Thus implies (for Cz large enough) that d’ < d + 1, and thus 1 < d’ < d since d’ 
is an integer. Since 


[P| = Qg (26O ADA] = UTJAN = UIA) 











and the claim follows. 





Remark 12.8 The key trick here was to split up the long sum /A by expressing 
it as a binary tree of binary sums 2" A + 2A = 2%+! A. The bounds we had on 
|LA| forced one of the binary sums to have small doubling constant, at which point 
we could use an inverse theorem, in this case the Freiman cube lemma and the 
Ruzsa—Chang theorem. A similar trick was employed in Theorem 2.35; see also 
[42]. This method is sometimes referred to as the tree argument. 


Remark 12.9 The above proof made use of some rather powerful theorems, 
including Theorem 3.40 and the Ruzsa—Chang theorem. However, it is possible to 
prove the above results without using such deep facts from additive geometry and 
Fourier analysis, instead relying on more elementary inverse theorems such as the 
3k — 3 theorem (Theorem 5.11) and the Freiman cube lemma (Theorem 5.20). See 
[351], and the exercises below. This latter approach turns out to be more robust, 
in particular being able to deal with restricted sum sets /* A. 


Exercises 


12.2.1 Show that Theorem 12.5 implies Theorem 12.4. (Hint: use Exercise 3.2.5.) 
12.2.2 Using only the 3k — 3 theorem and the tree argument, show that if P is 
an arithmetic progression of integers and A C P is such that |A| > ô| P], 


12.2.3 


12.2.4 


12.2.5 


12.2.6 


12.2.7 


12.2.8 


12.3 Generalizations and variants 477 


then there exists an positive integer l = O3(1) such that /A contains an 
arithmetic progression of length Q25(| P|). 

Using only the preceding exercise and an iteration argument, show that if 
P isaprogression of integers ofrankd and A C Pissuchthat|A| > ô| P|, 
then there exists a positive integer l = Os a(1) such that LA contains a 
progression of rank at most d and cardinality Qs, 4(|P|). 

Without using Theorem 3.40, show that if P is a progression of integers 
of rank d, then there exists a positive integer l = O,(1) such that /P 
contains a proper progression of rank at most d and cardinality Qy(| P|). 
(You may wish to use Freiman’s cube lemma, Theorem 5.20 or the variant 
in Proposition 5.35.) 

[351] (Filling lemma) Using only the preceding exercises, show that if P 
is a progression of integers of rank d and A C P is such that |A| > ô| P|, 
then there exists a positive integer l = Os, a(1) such that LA contains a 
proper progression of rank at most d and cardinality Qs ¢(| P|). 

[350] Let P = a+ [0, N]- v be a progression of rank d. Show that if P 
is not proper, then |2P| < (1 — sm) | [0, 2N]|. More generally, prove that 


oa) 


IkP| < 
k 





I[0, kN]]| 


for all k > 1. Thus an improper progression becomes “increasingly 
improper” as one dilates it. 

Using the preceding exercise, show that if P is a proper progression of 
integers of rank d such that 2P is not proper, show that there is a positive 
integer l = O4(1) such that /P contains a proper progression of rank at 
most d — 1 with cardinality Q4 (|P |). 

Use the above exercises to give alternate proofs of Lemma 12.6 and 
Theorem 12.5. 


12.3 Generalizations and variants 


There are various extensions of Theorem 12.4 and Theorem 12.5. An easy modi- 
fication of the above arguments allows one to handle distinct summands: 


Theorem 12.10 /350] For any fixed positive integer d there is a constant C4 > 
0 such that the following holds. Let A,,..., A; be subsets of [1,n] of size m 
where l and m satisfy 14m > Can. Then A, +- - - + A; contains a progression of 


rank d' and volume at least Qa(l”m), for some integer 1 < d' < d. In particular, 


Ai +>: 


- + A; contains an arithmetic progression of length at least Qa(lm!/®). 


478 12 Long arithmetic progressions in sum sets 


From Lemma 5.25 we can also replace the integers by any other torsion-free 
additive group without difficulty. A more difficult strengthening is to work with 
the restricted sum sets /* A instead of / A. It is possible to adapt the above methods 
to deal with this case too: 


Theorem 12.11 /350] For any fixed positive integer d there is a constant C4 > 0 
such that the following holds. For any positive integers n and l and any set A C 
[1, n] satisfying | < |A|/2 and1*|A| > Can, I*A contains a proper progression of 
rank d' and volume at least Qq(I“ |A|), for some integer 1 < d' < d. In particular 
I*A contains a proper arithmetic progression of length Qq(l|A|!/“). 


If we define f*(m,l1,n) to be the obvious analog of f(m,l,n) with LA 
replaced by /*A, then we conclude (using Lemma 12.3 for the upper bound) 
that f*(m,1,n) = Qg(Im'/4) whenever Caja <m > Capa and m > 21. (Note 
that f*(m, Ll, n) becomes vacuous when m < l, so the condition m > 2l is fairly 
natural.) 

Theorem 12.11 is significantly harder than Theorem 12.4 and we will not present 
it here. Let us, however, mention an important lemma, which can be viewed as a 
variant of Freiman theorem for subset sums. This lemma asserts that if /* A does 
not yield a proper GAP as claimed by the theorem, then A must contain a big 
subset which has a very rigid structure. 


Lemma 12.12 For any positive constants v and d there are positive constants 
ô, a and dı such that the following holds. Let A be a subset of [n], | be a positive 
integer andn > f(n) > 1 be a function of n such that 


max { log" n, (40 f (n) log, n)'°7} < I < |Al/2 
and I“|A| f(n) > n. Then one of the following two statements must hold: 


e I*A contains a proper GAP of rank d' and volume Ql’ |A|) for some 
1<d <d; 

e there is a subset A of A with cardinality at least 8| A| which is contained in a 
GAP P of rank dı and volume O(\A| f (n)'*” log” n). 


The function f(n) can be seen as a rigidity parameter. The closer /“| A| is to n, the 
more rigid is the structure of Å. 

The case d = 1 is of special importance, being a generalization of the Theo- 
rem 12.2, and we isolate it as a corollary. 


Corollary 12.13 There exists a constant C > 0 such thatwhenever A C [1, n] and 
1 < l < |A|/2is such thatl|A| > Cn, then l* A contains an arithmetic progression 
of length cl|A\. 


12.3 Generalizations and variants 479 


This result has the following consequence for subset sums: 


Corollary 12.14 [350] If A is a subset of (1, n] of cardinality at least C./n for a 
sufficiently large absolute constant C > 0, then the subset sums F S(A) contains 
an arithmetic progression of length n. 


The first part of this result, with ./n replaced by y/n logn, was originally proven 
by Freiman [115]; we leave the deduction of this corollary from Corollary 12.13 
as an exercise. 

Another variant is to work in a cyclic group Z, of prime order instead of in an 
interval [1, n]. Here, one can modify the preceding arguments to obtain 


Theorem 12.15 [350] For any d > 1 there exists Ca > 0 such that the following 
holds. For any additive set A in a cyclic group Z, of prime order and anyl > 1 with 
14+") A| > Can and |A| > 2, the set LA contains a proper arithmetic progression 
of length min(n, Qq(1|A|'/4)). 


There are two differences between this theorem and Theorem 12.4. First the 
progression has length min(n, a(l |A| '/4)) instead of Qy(I|A|'/4), but this is nat- 
ural since ZA cannot exceed n in size. Second the condition on / has been relaxed 
from /4|A| > Can to 14+!|A| > Can. This is ultimately because the trivial bound 
|[A| < In which was used in the [1, n] case can now be improved to the trivial bound 
|1A| < n. Otherwise the argument is essentially the same, and is left as an exercise. 


Exercises 


12.3.1 Prove Theorem 12.10. (Hint: use the tree argument with different sets 
at the leaves. You will need to replace the Ruzsa—Chang theorem by a 
different result, such as Theorem 4.43, or use the elementary approach 
sketched in earlier exercises.) 

12.3.2 Deduce Corollary 12.14 from Corollary 12.11. 

12.3.3 Let Z, be a cyclic group of prime order, and let f(m, l, Zn) be defined 
just like f(m, l, n) but now with the sets A lying in Z, instead of [1, n]. 
Show that f(m,/, Z,) =n if and only if /(m — 1) > n — 1. (Hint: use 
Exercise 12.1.1.) Show also that for every d > 1 there exist constants 
Ca, Ca > 0 such that f(m, l, n) = ©,(Im'/¢) whenever Cama <m> 
Ca ja $ 

12.3.4 Prove Theorem 12.15. (Note that the hypothesis that n is prime will 
prevent any “torsion” issues from arising in the progressions until the 
progressions become of size comparable to n, at which point one can 
proceed using the Cauchy—Davenport inequality instead.) 


480 12 Long arithmetic progressions in sum sets 


12.4 Complete and subcomplete sequences 


An infinite set A C Z* of positive integers is complete if its subset sums F S(A) 
contain every sufficiently large positive integer. This notion is similar to, but distinct 
from, the concept of a base as studied in Chapter 1, as we allow sums of arbitrary 
length but require the summands to be distinct. The notion of complete sequences 
was introduced by Erdős in the early sixties and has since then been studied 
extensively by various researchers (see [89, Section 6] or [274, Section 4.3] for 
surveys). The center of this study is to find necessary and sufficient conditions for 
a sequence to be complete. 

Intuitively, the denser a set is, the more likely it is to be complete. However, 
density alone is not sufficient; the even integers have density 1/2 but are not 
complete. More generally, any set contained in an infinite arithmetic progression 
containing zero will not be complete. To deal with these cases, let us say that a set 
A is subcomplete if F S(A) contains an infinite arithmetic progression a + N-r = 
{a,a+r,a+ 2r,...}.Itis easy to see that these two notions are related as follows: 


Lemma 12.16 Let A C Z* be infinite. Then A is complete if and only if A is 
subcomplete and F S(A) intersects every infinite arithmetic progression in Z*. 


We leave this lemma as an exercise. The condition that F S(A) intersects every 
infinite arithmetic progression in Z* is a local condition that only depends on the 
residue classes that A occupies, together with their multiplicity; see exercises. In 
particular, this condition is typically quite easy to verify for standard bases such 
as the Waring bases N^k or the primes P. Thus we shall focus on the subcomplete 
property. 

A simple example of Cassels [46] shows that there exist sets A of density 
|AN[1, n]| = Q(n'/?) which are not subcomplete; see exercises. Remarkably, 
this example is sharp up to constants: 


Theorem 12.17 [350] There exists an absolute constant C > 0 such that every 
infinite set A C Z* with |AN[1,n]| > Cn'/? is subcomplete. In particular, if 
FS(A) also intersects every infinite arithmetic progression in Z*, then A is 
complete. 


We prove this result in the next section; the main tool is Corollary 12.14. The 
second part of this result was conjectured by Erdős [85] in 1962, and the first part 
by Folkman [103] in 1966. In [85] the second part was proven under the stronger 
hypothesis |A N [1, n]| > Cn¥5-/2, while in [103] the first part was proven under 
the hypothesis |A N [1, n]| > n'/?*® for any e > 0 and sufficiently large n. This 
was lowered to Cn!/? log'/? n by Hegyvari [181] and Luczak and Schoen [241], 
using Sark6zy’s theorem (see the exercises). 


12.4 Complete and subcomplete sequences 481 


There is an analog of the above results for infinite multisets A = {a1, do, ...} 
in Zt, where a; < az < --- are allowed to have repetitions, and define the finite 
sum sets 

FS(A):= | Y ai : I CZ", finite 
iel 
in analogy with before, and define the notion of completeness and subcompleteness 
as above. In this case it is possible to have a density as large as |A N [1, n]| = 


Q,(n'~*) for any given € > 0 (where of course we count multiplicity) and still not 
have subcompleteness (see exercises). Again, this example is basically sharp. 


Theorem 12.18 [350] There exists an absolute constant C > 0 such that every 
infinite multiset A C Z* with |AQ[1,n]| > Cn is subcomplete. In particular, 
if FS(A) also intersects every infinite arithmetic progression in Z*, then A is 
complete. 


This result was conjectured by Folkman [103], and is proven very similarly to 
Theorem 12.17, we leave it as an exercise for the next section where Theorem 12.17 
is proved. 

To end this section, let us disuss the finite version of completeness. We say 
that a subset A of Z, (for a large prime p) is complete if FS(A) = Z,. Olson 
[265], answering a question of Erdős and Heilbronn, proved that if |A| > 2./p, 
then A is complete. The bound 2,/p is essentially sharp. To see this, take A = 
{—k, —(k —1),...,—-1,0,1,...,(k — 1), k}, where k is the largest integer such 
that Sr i < p/2. Deshouillers and Freiman [70] showed that this is actually 
the only example, given that |A| is sufficiently large. We call a set A of integers 
between 0 and p — 1 small if 





Yi la/pl <1 


acA 


where ||z|| (as usual) is the distance from z to the closest integer. It is easy to check 
that a small set is not complete. 


Theorem 12.19 Let A be a subset of Z, with more than ./2p elements. If A is not 
complete, then there is a non-zero element x of Z, such that the set x - A is small. 


Szemerédi and Vu [349, 352] showed that it is possible to weaken the condition 
|A| > ./2p considerably by dropping a small subset from A. 


Theorem 12.20 Let A be anon-complete subset of Zp. Then there is a subset A‘ of 
A with at most p® elements and a non-zero element x of Z, such that x - (A\A’) 
is small. 


482 


12 Long arithmetic progressions in sum sets 


Exercises 


12.4.1 


12.4.2 
12.4.3 


12.4.4 


12.4.5 


12.4.6 


12.4.7 


Show that any infinite set A C Z* of lower density strictly greater than 
1/2 is complete. 

Prove Lemma 12.16. Generalize this to multisets. 

Let A C Z* be an infinite set, and for each positive integer N let 
Ay C Zy be the set of all residue classes a mod N whose intersec- 
tion with A is infinite, and let Ai, be those elements of A which do not 
lie in one of the residue classes in AY (this set is automatically finite). 
Show that F S(A) intersects every infinite arithmetic progression in Z* if 
and only if FS(A),)mod N + (AX) = Zy for all N. Note that one has 
FS(Ay)mod N = FS(A‘, mod N) if we view Aiymod N as a multiset; 
thus this criterion uses only the multiplicities of A modulo N rather than 
the actual values of A. 

Consider an infinite sequence A = {a1, a2, ...}. Prove that if 


i-1 
lim sup (e = 


a ,) > œ, (12.1) 
i—> o0 j=l 

then A is not subcomplete. 

[46],[351] Let m := 10* and let A := Um? /4, m?” /2]. Show that 
|AN[1, n]| = Qn!) for all n, and A intersects every infinite arithmetic 
progression in Zt but that A is not subcomplete nor complete. 
Modifying the previous example, show that for any € > 0 there exists a 
multiset A with |A N [1, n]| = Q,(n'~*) which intersects every infinite 
arithmetic progression in Z+ but is not subcomplete. 

[181], [241] Let A C [1,7] be such that |A| > Cn'/? log! n for some 
large constant C > 0 Show that there exists 1 < i < O(logn) and a set 
A; C [1, 2n] such that 2'|A;| = Q(Cn) and each element in A; can be 
written as the sum of two distinct elements of A between 2! and 2'*! times. 
Use this and Theorem 12.2 to prove the first part of Corollary 12.14 with 
n'/? replaced by n!/? log'/? n. By using the arguments of the next section, 
this establishes Theorem 12.17, again with n'/? replaced by n"? log!” n. 


12.5 Proof of Theorem 12.17 


To prove Theorem 12.17, it is convenient to reduce the (infinitary) condition of 
subcompleteness to a finitary version. Let us say that a partition A = A’ U A” of 
a multiset A of positive integers is good if the following two properties hold: 


12.5 Proof of Theorem 12.17 483 


e there is a number d such that F S(A’) contains arbitrary long arithmetic 
progressions with difference d; 
e let A” = {bı < bp < b; < ---}, then 


i-l 
lim (x ») — b; = +00. (12.2) 
I—> 0O 
j=1 
Thus A’ enjoys a finitary version of subcompleteness, whereas A” grows slower 
than a lacunary sequence. These two conditions imply subcompleteness: 


Lemma 12.21 /351, 241, 181] Any sequence A of positive integers which admits 
a good partition is subcomplete. 


Proof We begin with some reductions. First observe that we can remove finitely 
many elements from A” without affecting the condition (12.2). In particular, we 
can remove any residue class a + d - Z mod d which contains only finitely many 
elements from A”. 

Let Ag C Za be the set of residue classes mod d which intersect A” (and thus 
contain infinitely many elements from A”, by the above reduction). The group 
(Aq) spanned by A” is a subgroup of Z4 and thus has the form (Aq) = d’ - Za for 
some factor d’ of d. In particular we see that every element of A” is a multiple 
of d’. Observe that F S(A”) must intersect every residue class in (Az). Thus there 
exists a finite set B C A” such that F S(B) intersects every residue class in d’ - Z4. 

Let A” := A”\B, thus A” still obeys (12.2). A simple greedy algorithm argu- 
ment then shows that the subset sums F'S(A”) are syndetic (has bounded gaps); 
more precisely, there exists L > O such that given any positive integer n, we 
can find an element m € F'S(A”’) such that 0 < n — m < L. Since F'S(A””) con- 
sists entirely of multiples of d’, and FS(B) intersects every residue class in 
d' - Za, we conclude that the set (F S(A’”) + FS(B))A (d - Z) is also syndetic. 
Since F'S(A’) contains arbitrarily long progressions of length d, we conclude 
that FS(A’) + [((FPS(A”) + FS(B)) 2 (d - Z)] contains an infinite progression of 
length d. But since this set is contained in F S(A), we see that A is subcomplete 
as claimed. 














We can now prove Theorem 12.17. 


Proof of Theorem 12.17 We write A = {a, < a <---},andsplit A = A’ U A” 
where A’ := {azm : m € Zt} and A” := {a2m—1 : m € Z*}. It is easy to see using 
the hypothesis |A N [1, n]| < Cn!” that the set A” will obey (12.2) and we leave 
it as an exercise. Thus we only need to show that F'S(A’) contains arbitrarily long 
arithmetic progressions of a fixed step d. 

For each non-negative integer j, let A’, := {azm : 24 <m <2/+!}. Thus the 
A’, partition A’. Also, the hypothesis |A N [1, ]| < Cn'/? implies that for all 


484 12 Long arithmetic progressions in sum sets 


sufficiently large j we have Aj C [1, nj] for some n; = @(27/ /C?). From Corol- 
lary 12.14 we conclude (if C is large enough) that F S(Aj) contains a proper 
arithmetic progression P; of length n; for all j larger than some initial jo. Note 
that FS (A‘) Cl, 2/n jl, hence the step d; of the progression P; cannot exceed 
2/. 

This is almost what we need, except that the progressions P; do not have the 
same step. This however can be dealt with using the following elementary lemma, 
which follows from Exercise 3.6.5. 


Lemma 12.22 (Coalescence of arithmetic progressions) /349, 350] Let P4, P2 
be proper arithmetic progressions of integers of length N,, Nz and step d,, d) > 0 
respectively, where Nz > 5d, and N; > 5d). Then Pı + P, contains a proper 
arithmetic progression of length N, + Nz — 2 whose difference is gcd(d,, dh). 


Using this lemma we can see inductively that for jo sufficiently large, the set 
Pi +-++-+ P; contains an proper arithmetic progression of length nj, + +--+ 
nj; — O(j) and step gcd(d,,...,d;) for each j > jo. The steps gcd(dj,..., dj) 
are decreasing positive integers and thus must eventually stabilize at some fixed 
d. Since Pi + +++ + P} is contained in FS(A‘) feet FS(A‘), which in turn is 
contained in F S(A’), and nj, +----++nj; — O(j) goes to œ as j — on, the claim 
follows. 














The proof of Theorem 12.18 is similar and is left as an exercise. 


Exercises 


12.5.1 Show that the set A” used in the proof of Theorem 12.17 obeys (12.2). 

12.5.2 [350] Show that there is a constant C such that the following holds. If A is a 
multiset of positive integers in [1, n] with |A| > Cn, then F S(A) contains 
an arithmetic progression of length n. (Hint: Use Theorem 12.10.) 

12.5.3 [350] Prove Theorem 12.18. (Hint: use the previous exercise as a substi- 
tute for Corollary 12.14.) 


12.6 Further applications 


In this section, we present a few short applications of Corollary 12.13, taken from 
[380]. For several applications of Theorem 12.2 we refer to [308, 309] and the 
references therein. 

The following simple lemma will be useful. Let ZX denote the residue classes 
in Z, which are coprime to n. 


12.6 Further applications 485 


Lemma 12.23 Let n be a positive integer and A be a multiset of k elements 
of ZX for some 1<k <n. Then |FS(A)| > k. In particular, if |A| =n, then 
FS(A) = Zp. 


Proof Observe that if ae Z*, then |FS(AU {a})| = |FS(A) + {0, a}| = 
|F S(A)| + 1 by Kneser’s theorem (Theorem 5.5) or direct computation. The claim 
then follows by induction. 














12.6.1 Olson’s problem 


We say that an additive set A in a finite ambient group Z is complete if FS(A) = Z. 
In this section, we investigate the case when Z is a cyclic group Z = Z,. 

A well-known result of Olson [265], mentioned in Section 12.4, shows that if 
n is a prime and |A| > 2n!/*, then A is complete. 


Theorem 12.24 [265] If n is a prime and A C Z, has cardinality |A| > 2n'/2, 
then A is complete. 


Olson later extended his result to an arbitrary finite group [268], with the con- 
stant 2 replaced by a larger constant. Here we give a short proof for the case when 
G is cyclic. 


Theorem 12.25 There is a constant C such that the following holds. If n is a 
sufficiently large positive integer and A C ZX has cardinality |A| > Cn‘/?, then 
A is complete. 


Remark 12.26 The assumption that the elements of A are coprime with n is 
necessary. For instance, if n is divisible by 3 then it is possible to have an incomplete 
set of size n/3. Without the coprime assumption, the problem of bounding | A| is 
known as Diderrich’s problem. It has been proved that the sharp bound for |A| is 
p+n/p — 2, where p is the smallest prime divisor of n (see [235] for the case of 
cyclic groups and [127] for the general case of arbitrary additive groups). 


Proof For convenience, we identify the elements of A as positive integers in 
[1, — 1]. Let us split A into two components A’ U A” each of cardinality at 
least Cn'/*/2 — 1. By choosing C large enough, we see from Corollary 12.14 that 
F S(A') (viewed as a subset of Z) will contain a proper arithmetic progression P’ 
of length n. If the step d’ of this progression is coprime to n, then it will cover all 
the residue classes of Z,, and we will be done, so suppose that this is not the case; 
then the quantity d := gcd(d’, n) is larger than 1. Then F S(A’) intersects all the 
residue classes in d - Z,, so it will suffice to show that F S(A”) intersects all the 
residue classes in Zy. 


486 12 Long arithmetic progressions in sum sets 


Note that the largest element in F S(A’) (and hence in P) is at most O(n?/?), 
hence d’ (and d) are O(n'/*), In particular, we see by choosing C large enough 
that |A”| > d’. By Lemma 12.23 F S(A”) intersects all the residue classes in Z4, 
and we are done. 














We conjecture that one can have C = 2 in Theorem 12.25. Hamidoune [173] 
made the following general conjecture for arbitrarily finite group. 


Conjecture 12.27 Let G be a cyclic group of any order or a group (possibly 
nonabelian) of odd order, and let A be a subset of G such that, for every subgroup 
H of G, we have |H N A| > 2,/|H|. Then A is complete. 


12.6.2 Monochromatic sum sets 


Let f(n) be the smallest number such that one can color [1, n — 1] by f(n) colors 
so that n cannot be represented as sum of distinct numbers with the same color. 
Alon and Erdős [9] established the correct order of growth of f (n) up to logarithmic 
factors. 

1/3 


VE )and f(n) = Aa = 


log 


1/3 


Theorem 12.28 For alllargen we have f(n) = O(—4;- a ). 


log 

It is conjectured [9] that the exact order of magnitude of f(n) is closer to the 

upper bound. Combining Corollary 12.13 with the arguments from [9], one can 
have the following improvement 


ni3 





Theorem 12.29 [380] For all large n we have f(n) = QC) 


Proof Letc > 0 bea small number to be chosen later, and color [1, n — 1] by at 
most ch colors. It will suffice to represent n as the sum of distinct numbers of 
the same color. 

From the prime number theorem (1.44) (or Exercise 1.10.4) there are O(4 





lo —) 
primes of magnitude @(n7/*). By the pigeonhole principle, we can thus find 
a monochromatic set A of primes of magnitude @(n7/*) of cardinality |A| = 
@(c7!n"3), Let us partition A = A; U A2 U A3 where each A; also has cardinal- 
ity @(c7!n"/3). It would thus suffice to show that FS(A) = FS(A,) + FS(A2) + 
F S(A3) contains n. 

Applying Corollary 12.13 with Z = @(c!/?n'/?) we see (for c small enough) 
that /*A,, and hence F'S(A,), contains an arithmetic progression P, of length 
Q(c~!/7n2/3). Since the elements of /* A; have magnitude at most O(c!/?n), we 
thus see the elements of P, do also, and that the step d of P, is at most O(cn"/3), 

Now the elements of Az are primes and are larger than d. In particular we 
can find a subset A of Az with |A2| = d and all elements of Az coprime to d. 
By Lemma 12.23 we see that FS(A}) intersects all the residue classes in Zz. 


12.6 Further applications 487 


Also note that the elements in F S(A‘) are quite small, having magnitude at most 
O(dn*!?). 

Now we add the long progression P, of step d to the small set FS(A4) which 
intersects all the residue classes in Z4, and observe that the set P; + FS(A}) 
contains an interval J of length at least Q(c7/7n?/3), Indeed, if P) = {a,a + 
d,...,a + md}, then one easily verifies that P, + F.S(A‘) contains the interval 
I := [a + O(dn?/3), a + md] which has length Q(md) = Q(m) = Q(c7!/7n?/3), 
taking c suitably small of course. 

Finally, observe that all the elements of A} have magnitude ©(n7/3) and their 
total sum is @(c7!n), which is larger than n for c small enough. Thus by the 
greedy algorithm one can subtract distinct elements of A3 from n until one enters 
the interval 7 from the preceding paragraph. This shows that n € Pı + FS(A4) + 
FS(A3) C FS(A), as claimed. 














Remark 12.30 The proof was rather wasteful. We lose a factor of log n by reducing 
ourselves to the set of primes. On the other hand, the only thing we need is that 
our set contains enough primes in order to apply Lemma 12.23 to A2. Note though 
that the elements with small prime factors coprime to n can be grouped into large 
color classes for which no subset sum can equal n (see the exercise), so the primes 
are in fact a large subset of the “useful” elements of [1, n]. 


Exercise 


12.6.1 [9] Prove the upper bound in Theorem 12.28. (Hint: experiment with the 
color class consisting of all the multiples of p, where p is a prime not 
dividing n that is not too large, the color class lai 7) for integers k that 
are not too large, and color classes consisting of elements whose total 
sum is less than n.) 


Bibliography 





[1] 
[2] 
[3] 
[4] 


[5] 
[6] 


[7] 


[8] 
[9] 
[10] 
[11] 


[12] 
[13] 


[14] 


[15] 
[16] 


[17] 


M. Ajtai, V. Chvátal, M. Newborn, and E. Szemerédi, Crossing-free subgraphs, 
Annals of Discrete Mathematics 12 (1982), 9-12. 

M. Ajtai, J. Komlós, and E. Szemerédi, A dense infinite Sidon sequence, European 
J. Combin. 2 (1981), (1), 1-11. 

M. Ajtai and E. Szemerédi, Sets of lattice points that form no squares, Studia 
Scientarium Mathematicarum Hungarica 9 (1974), 9-11. 

N. Alon, Combinatorial Nullstellensatz, Recent trends in combinatorics 
(Matrahaza, 1995), Combin. Probab. Comput. 8 (1999), (1-2), 7-29. 

N. Alon, Additive Latin transversals, Israel J. Math. 117 (2000), 125-130. 

N. Alon and M. Dubiner, A lattice point problem and additive number theory, 
Combinatorica 15 (1995), (3), 301-309. 

N. Alon, A. Shapira, Every monotone graph property is testable. In STOC’05: Pro- 
ceedings of the 37th Annual ACM Symposium on Theory of Computing, Association 
of Computing Machinery (2005), 128-137. 

N. Alon, R. Duke, H. Leffman, V. Rödl and R. Yuster, The algorithmic aspects of 
the regularity lemma, J. Algorithms 16 (1994), 80-109. 

N. Alon and P. Erdős, Sure monochromatic subset sums, Acta Arith. 74 (1996), 
(3), 269-272. 

N. Alon and D.J. Kleitman, Sum-free subsets. In A Tribute to Paul Erdős, A. Baker, 
B. Bollobds and A. Hajnal (eds), Cambridge University Press (1990), 13-26. 

N. Alon, M. Nathanson and I. Ruzsa, The polynomial method and restricted sums 
of congruence classes, J. Number Theory 56 (1996), (2), 404-417. 

N. Alon and J. Spencer, The Probabilistic Method, Second edition, Wiley, (2000). 
G.E. Andrews, A lower bound for the volume of strictly convex bodies with many 
boundary lattice points, Trans. Amer. Math. Soc. 106 (1963), 270-279. 

B. Aronov, J. Pach, M. Sharir and G. Tardos, Distinct distances in three and higher 
dimensions, Combin. Probab. Comput. 13 (2004), (3), 283-293. 

A. Balog, Linear equations in primes, Mathematika 39 (1992) 367-378. 

A. Balog and E. Szemerédi, A statistical theorem of set addition, Combinatorica 
14 (1994), 263-268. 

A. Baltz, T. Schoen and A. Srivastav, Probabilistic construction of small strongly 
sum-free sets via large Sidon sets. In Randomizarion, Approximation, and 


488 


[18] 


[19] 


[20] 
[21] 


[22] 
[23] 
[24] 
[25] 
[26] 


[27] 


[28] 
[29] 
[30] 
[31] 
[32] 
[33] 
[34] 
[35] 
[36] 
[37] 


[38] 


Bibliography 489 


Combinatorial Optimization, Lecture Notes in Computer Science 1671, Springer, 
(1999), 138-143. 

B. Barak, R. Impagliazzo and A. Wigderson, Extracting randomness using few 
independent sources. In Proceedings FOCS 2004, [EEE Computer Society, (2004), 
384-393. 

J. Beck, On the lattice property of the plane and some problems of Dirac, Motzkin, 
and Erdős in combinatorial geometry, Combinatorica 3 (1983), 281-297. 

W. Beckner, Inequalities in Fourier analysis, Annals of Math. 102 (1975), 159-182. 
FA. Behrend, On sets of integers which contain no three terms in arithmetic pro- 
gression, Proc. Nat. Acad. Sci. 32 (1946), 331-332. 

V. Bergelson, B. Host and B. Kra, Multiple recurrence and nilsequences, with an 
appendix by Imre Ruzsa. Invent. Math. 160 (2005), (2), 261-303. 

V. Bergelson and A. Leibman, Polynomial extensions of van der Waerden’s and 
Szemerédi’s theorems, J. Amer. Math. Soc. 9 (1996), 725-753. 

V. Bergelson and A. Leibman, Set polynomials and polynomial extension of the 
Hales—Jewett theorem, Ann. of Math. (Series 2) 150 (1999), (1), 33-75. 

S. Bernstein, Sur une modification de l’inéqualité de Tchebichef, Annal. Sci. Inst. 
Sav. Ukr. Sect. Math. I (1924). 

Y. Bilu, The (œ + 2)-inequality on a torus, J. London Math. Soc. (Series 2) 57 
(1998), (3), 513-528. 

Y. Bilu, Addition of integer sequences and subsets of real tori. In Number Theory 
in Progress: Proc. Int. Conf. in Number Theory in Honor of A. Schinzel, Zakopane, 
1997, K. Gyéry, H. Iwaniec and J. Urbanowicz (eds), W. de Gruyter, (1999), 
639-649. 

Y. Bilu, Structure of sets with small sumset, Structure theory of set addition. Aster- 
isque 258 (1999), xi, 77-108. 

Y. Bilu, V. Lev and I. Ruzsa, Rectification principles in additive number theory, 
Discrete Comput. Geom. 19 (1998), 343-353. 

N. N. Bogolyubov, Sur quelques propriétés arithmétiques des presque-périodes, 
Ann. Chaire Math. Phys. Kiev 4 (1939), 185-194. 

B. Bollobas, Sperner systems consisting of pairs of complementary subsets, J. 
Comb. Theory, Ser. A 15 (1973), 363-366. 

B. Bollobas, Combinatorics: Set Systems, Hypergraphs, Families of Vectors, and 
Combinatorial Probability, Cambridge University Press, (1986). 

A. Bonami, Etudes des coefficients Fourier des fonctiones de L?(G), Ann. Inst. 
Fourier(Grenoble) 20 (1970) (2), 335—402. 

J. Bourgain, A Szemerédi type theorem for sets of positive density in R*, Israel J. 
Math. 54 (1986) (3), 307-316. 

J. Bourgain, Bounded orthogonal systems and the A(p)-set problem, Acta Math. 
162 (1989), (3-4), 227-245. 

J. Bourgain, On arithmetic progressions in sums of sets of integers. In A Tribute to 
Paul Erdős, Cambridge University Press, (1990), 105-110. 

J. Bourgain, Estimates related to sumfree subsets of sets of integers, Israel J. Math. 
97 (1997), 71-92. 

J. Bourgain, On the dimension of Kakeya sets and related maximal inequalities, 
GAFA 9 (1999), 256-282. 


490 


[39] 
[40] 


[41] 
[42] 
[43] 


[44] 


[45] 
[46] 


[47] 
[48] 


[49] 
[50] 


[51] 
[52] 


[53] 
[54] 
[55] 
[56] 
[57] 
[58] 
[59] 


[60] 


[61] 


[62] 


Bibliography 


J. Bourgain, On triples in arithmetic progression, GAFA 9 (1999), 968-984. 

J. Bourgain, Estimates on exponential sums related to the Diffie-Hellman distri- 
butions, GAFA 15 (2005) (1), 1-34. 

J. Bourgain, Mordell’s exponential sum estimate revisited, J. Amer. Math. Soc. 18 
(2005) (2), 477-4993. 

J. Bourgain and M. Chang, On the size of k-fold sum and product sets of integers, 
J. Amer. Math. Soc. 17 (2004) (2), 473-497. 

J. Bourgain, N. Katz and T. Tao, A sum-product estimate in finite fields, and 
applications, GAFA 14 (2004), 27-57. 

J. Bourgain and S. Konyagin, Estimates for the number of sums and products and 
for exponential sums over subgroups in fields of prime order, C. R. Acad. Sci. Paris, 
Ser. I 337 (2003), 75-80. 

T. Brown and J. Buhler, A density version of a geometric Ramsey theorem, J. 
Combin. Theory Ser. A 32 (1982) (1), 20-34. 

J.W.S. Cassels, On the representation of integers as the sums of distinct summands 
taken from a fixed set, Acta Sci. Math. Szeged 21 (1960), 111-124. 

A.L. Cauchy, Recherches sur les nombres, J. Ecole Polytech. 9 (1813), 99-116. 
M. Chang, A polynomial bound in Freiman’s theorem, Duke Math. J. 113 (2002) 
(3), 399-419. 

M. Chang, Erdés—Szemeredi sum-product problem, Annals of Math. 157 (2003), 
939-957. 

M. Chang, A sum-product estimate in algebraic division algebras over R, Israel J. 
Math. 150 (2005), 369. 

M. Chang, Additive and multiplicative structure in matrix spaces, preprint. 

B. Chazelle and J. Friedman, Point location among hyperplanes and unidirectional 
ray-shooting, Comput. Geom. 4 (1994) (2), 53-62. 

S. Chen, On the size of finite Sidon sequences, Proc. Amer. Math. Soc. 121 (1994) 
(2), 353-356. 

H. Chernoff, A measure of the asymptotic efficiency for tests of a hypothesis based 
on the sum of observations, Ann Math Stat. 23 (1952), 493-509. 

S.L.G. Choi, The largest sum-free subsequence from a sequence of n numbers, 
Proc. Amer. Math. Soc. 39 (1973), 42—44. 

S.L.G. Choi, P. Erdős and M. Nathanson, Lagrange’s theorem with N! squares, 
Proc. Am. Math. Soc., 79 (1980), 203-205. 

F. Chung, The number of distinct distances determined by n points in the plane, J. 
Combin. Theory Ser. A 36 (1984), 342-354. 

F. Chung and R. Graham, Quasi-random subsets of Z,, J. Comb. Th. A 61 (1992), 
64-86. 

F. Chung, R. Graham and R.M. Wilson, Quasi-random graphs, Combinatorica 9 
(1989), 345-362. 

F. Chung, E. Szemerédi and W. Trotter, The number of different distances deter- 
mined by a set of n points in the Euclidean plane, Discrete Computational Geom. 
7 (1992), 1-11. 

J. Cilleruelo, New upper bounds for finite B, sequences, Adv. Math. 159 (2001) 
(1), 1-17. 

J. Cilleruelo, I. Ruzsa and C. Trujillo, Upper and lower bounds for finite B,[g] 
sequences, J. Number Theory 97 (2002) (1), 26-34. 


[63] 


[64] 
[65] 
[66] 
[67] 
[68] 
[69] 
[70] 


[71] 


[72] 
[73] 
[74] 
[75] 
[76] 
[77] 
[78] 
[79] 
[80] 
[81] 
[82] 


[83] 
[84] 


[85] 


Bibliography 491 


K. Clarkson, H. Edelsbrunner, L. Gubias, M. Sharir and E. Welzl, Combinato- 
rial complexity bounds for arrangements of curves and spheres, Discrete Comput. 
Geom. 5 (1990), 99-160. 

K. Costello, T. Tao and V. Vu, Random symmetric matrices are almost surely 
non-singular, Duke Math. J., to appear. 

E. Croot, Long arithmetic progressions in critical sets, J. Combin. Theory Ser. A 
113 (2006) (1), 53-66. 

D. da Silva and Y. Hamidoune, Cyclic spaces for Grassmann derivatives and addi- 
tive theory, Bull. London Math. Soc. 26 (1994) (2), 140-146. 

S. Dasgupta, G. Karolyi, Gyula, O. Serra and B. Szegedy, Transversals of additive 
Latin squares, Israel J. Math 126 (2001), 17-28. 

H. Davenport, On the addition of residue classes, J. London Math. Soc. 10 (1935), 
30-32. 

J.-M. Deshouillers, F. Hennecart and A. Plagne, On small sumsets in (Z/2Z)", 
Combinatorica 24 (1) (2004), 53-68. 

J.-M. Deshouillers and G. Freiman, When subset-sums do not cover all the residues 
modulo p, J. Number Theory 104 (2004) (2), 255-262. 

J. Dieudonné, Une propriété des racines de l’unité, Collection of articles dedicated 
to Alberto Gonzalez Domiguez on his sixty-fifth birthday. Rev. Un. Mat. Argentina 
25 (1970/71), 1-3. 

D.L. Donoho and P.B. Stark, Uncertainty principles and signal recovery, SIAM J. 
Appl. Math. 49 (1989), 906-931. 

FJ. Dyson, A theorem on the densities of sets of integers, J. Lond. Math. Soc. 20 
(1945), 8-14. 

FJ. Dyson, Statistical theory of the energy levels of complex systems. I, J. Math. 
Phys. 3 (1962), 140-156. 

Y. Edel, Extensions of generalized product caps, Designs, Codes, and Cryptography 
31 (2004), 5-14. 

G. Elekes, On the number of sums and products, Acta Arith. 81 (1997), 365-367. 
G. Elekes, Sums versus products in number theory, algebra and Erdés geometry, 
Paul Erdős and his Mathematics, II (Budapest, 1999), Bolyai Soc. Math. Stud., 11 
Janos Bolyai Math. Soc., (2002), 241-290. 

G. Elekes and Z. Király, On the combinatorics of projective mappings, J. Algebraic 
Combin. 14 (2001) (3), 183-197. 

G. Elekes, M. Nathanson and I. Rusza, Convexity and sumsets, J. Number Theory 
83 (2000) (2), 194-201. 

G. Elekes and I. Ruzsa, Few sums, many products, Studia Sci. Math. Hungar. 40 
(2003) (3), 301-308. 

C. Elsholtz, Lower bounds for multidimensional sums, Combinatorica 24 (2004), 
351-358. 

P. Erdős, On a lemma of Littlewood and Offord, Bull. Amer. Math. Soc. 51 (1945), 
898-902. 

P. Erdős, On sets of distances of n points, Amer. Math. Monthly 53 (1946), 248-250. 
P. Erdős, Some remarks on the theory of graphs, Bull. Am. Math. Soc. 53 (1947), 
292-294. 

P. Erdős, On the representation of large interges as sums of distinct summands 
taken from a fixed set, Acta. Arith. 7 (1962), 345-354. 


492 


[86] 


[87] 


[88] 


[89] 


[90] 


[91] 


[92] 


[93] 


[94] 
[95] 
[96] 
[97] 
[98] 
[99] 
[100] 
[101] 
[102] 
[103] 
[104] 


[105] 


Bibliography 


P. Erdős, Extremal problems in number theory. In Proceedings of the Symp. Pure 
Math. VIII, American Mathematical Society (1965), 181-189. 

P. Erdős, P. Frankl and V. Rödl, The asymptotic number of graphs not containing a 
fixed subgraph and a problem for hypergraphs having no exponent, Graphs Combin. 
2 (1986) (2), 113-121. 

P. Erdős, A. Ginzburg and A. Ziv, Theorem in the additive number theory, Bull. 
Res. Council Israel 10F (1961) 41—43. 

P. Erdős and R. Graham, Old and new problems and results in combinatorial number 
theory, Monographies de L’ Enseignement Mathématique 28, Université de Genéve, 
L Enseignement Mathematique, Geneva, 1980. 

P. Erdős and M. Nathanson, Lagrange’s theorem and thin subsequences of squares. 
In Contribution to Probability, J. Gani and V.K. Rohatgi (eds), Academic Press, 
(1981), 3-9. 

P. Erdős and E. Szeremédi, On sums and products of integers. In Studies in Pure 
Mathematics; To the memory of Paul Turán, P. Erdős, L. Alpar, and G. Halasz 
(eds), Akademiai Kiado—Birkhauser Verlag, Budapest (1983), 213-218. 

P. Erdős, Problems and results in additive number theory. In Colloque sur la 
Théorie des Nombres, Bruxelles, 1955,George Thone & Masson and Cie, (1956), 
127-137. 

P. Erdős and L. Lovász, Problems and results on 3-chromatic hypergraphs and some 
related questions. In Infinite and Finite Sets (Colloq., Keszthely, 1973; dedicated 
to P. Erdős on his 60th birthday), Vol. II, Colloq. Math. Soc. Janos Bolyai, 10, 
North-Holland, (1975), 609-627. 

P. Erdős, C. Ko and R. Rado, Intersection theorems for systems of finite sets, Quart. 
J. Math. Oxford Ser. 2 12 (1961), 313-318. 

P. Erdős and R. Rado, Intersection theorems for systems of sets, J. London Math. 
Soc. 35 (1960), 85-90. 

J. Esary, F. Proschan and D. Walkup, Association of random variables with appli- 
cations, Ann. Math. Statist. 38 (1967), 1466-1476. 

P. Erdős and P. Tetali, Representations of integers as the sum of k terms, Random 
Structures Algorithms 1 (1990) (3), 245-261. 

P. Erdős and P. Turan, On a problem of Sidon in additive number theory and some 
related problems, J. London Math. Soc. 16 (1941), 212-215. 

P. Erdős and P. Turan, On some sequences of integers, J. London Math. Soc. 11 
(1936), 261-264. 

P. Erdés and P. Turan, On a problem of Sidon in additive number theory, and on 
some related problems, J. London Math. Soc. 16 (1941), 212-215. 

C.G. Esséen, On the Kolmogorov—Rogozin inequality for the concentration func- 
tion, Z. Wahrsch. Verw. Gebiete 5 (1966), 210-216. 

R.J. Evans and I.M. Stark, Generalized Vandermonde determinants and roots of 
unity of prime order, Proc. Amer. Math. Soc. 58 (1977), 51-54. 

J. Folkman, On the representation of integers as sums of distinct terms from a fixed 
sequence, Canad. J. Math. 18 (1966), 643-655. 

C.M. Fortuin, P.W. Kasteleyn and J. Ginibre, Correlation inequalities on some 
partially ordered sets, Comm. Math. Phys. 22 (1971), 89-103. 

K. Ford, Sums and products from a finite set of real numbers, Ramanujan Journal 
2 (1998) (1-2), 59-66. 


[106] 
[107] 


[108] 
[109] 
[110] 
[111] 


[112] 


[113] 


[114] 


[115] 


[116] 
[117] 


[118] 


[119] 


[120] 
[121] 


[122] 
[123] 
[124] 
[125] 


[126] 


[127] 


Bibliography 493 


K. Ford, The distribution of integers with a divisor in a given interval, preprint. 

P. Frankl, R. Graham and V. Rédl, On subsets of abelian groups with no 3-term 
arithmetic progression, J. Combin. Theory Ser. A 45 (1987) (1), 157-161. 

P. Frankl and Z. Fiiredi, Solution of the Littlewood—Offord problem in high dimen- 
sions. Ann. of Math. (series 2) 128 (1988) (2), 259-270. 

P. Frankl and V. Rödl, The uniformity lemma for hypergraphs, Graphs and Com- 
binat. 8(1992) (4), 309-312. 

P. Frankl and V. Rödl, Extremal problems on set systems, Random Struct. Algo- 
rithms 20 (2002) (2), 131-164. 

G. Freiman, Inverse problems in additive number theory VI., On the addition of 
finite sets III, Izv. Vyss. Uéebn. Zaved. Matem., 28 (1962) (3), 151-157. 

G. Freiman, Inverse problems of additive number theory, VII. On addition of finite 
sets, IV. The method of trigonometric sums, Izv. Vyss. Ucebn. Zaved. Matem., 28 
(1962) (3), 131-144. 

G. Freiman, Foundations of a Structural Theory of Set Addition, translated from the 
Russian. Translations of Mathematical Monographs, 37, American Mathematical 
Society (1973). 

G. Freiman, What is the structure of K if K + K is small? In Number Theory 
(New York, 1984-1985), Lecture Notes in Math. 1240, Springer-Verlag, (1987), 
109-134. 

G. Freiman, New analytical results in subset-sum problem. In Combinatorics and 
Algorithms (Jerusalem, 1988). Discrete Math. 114 (1993) (1-3, 205-217. 

G. Freiman, Structure theory of set addition, Asterisque 258 (1999), 1-33. 

G. Freiman, H. Halberstam and I. Ruzsa, Integer sum sets containing long arithmetic 
progressions, J. London Math. Soc. 46 (1992), 193-201. 

G. Freiman, A. Heppes and B. Uhrin, A lower estimation for the cardinality of 
finite difference sets in R”. In Number Theory, Vol. I (Budapest, 1987), Colloq. 
Math. Soc. Janos Bolyai, 51 North-Holland (1990), 125-139. 

G. Freiman and W. Pigarev, The relation between the invariants R and T (in Rus- 
sian), Kalinin. Gos. Univ. Moscow, 1973, 172-174. 

PE. Frenkel, Simple proof of Chebotarév’s theorem, preprint. math.AC/03 12398 
H. Furstenberg, Ergodic behavior of diagonal measures and a theorem of Szemerédi 
on arithmetic progressions, J. Analyse Math. 31 (1977), 204-256. 

H. Furstenberg, Recurrence in Ergodic theory and Combinatorial Number Theory, 
Princeton University Press (1981). 

H. Furstenberg and Y. Katznelson, An ergodic Szemerédi theorem for commuting 
transformations, J. Analyse Math. 34 (1979), 275-291. 

H. Furstenberg and Y. Katznelson, A density version of the Hales—Jewett theorem, 
J. d’Analyse Math. 57 (1991), 64-119. 

H. Furstenberg, Y. Katznelson and D. Ornstein, The ergodic theoretical proof of 
Szemerédi’s theorem, Bull. Amer. Math. Soc. 7 (1982), 527-552. 

H. Furstenberg and B. Weiss, A mean ergodic theorem for 
1/N peony f(T"x)g(T" x). In Convergence in Ergodic Theory and Proba- 
bility (Columbus OH 1993), Ohio State Univ. Math. Res. Inst. Publ., 5 de Gruyter 
(1996), 193-227. 

W. Gao and Y.O. Hamidoune, On additive bases, Acta Arith. 88 (1999) (3), 233- 
237. 


494 


[128] 
[129] 


[130] 
[131] 


[132] 
[133] 


[134] 
[135] 


[136] 
[137] 


[138] 
[139] 


[140] 
[141] 
[142] 
[143] 
[144] 
[145] 


[146] 


[147] 
[148] 


[149] 
[150] 


[151] 
[152] 
[153] 
[154] 


[155] 


Bibliography 


R.J. Gardner, The Brunn—Minkowski inequality, Bull. Amer. Math. Soc. (N.S.) 39 
(2002) (3), 355-405. 

R.J. Gardner and P. Gronchi, A Brunn—Minkowski inequality for the integer lattice, 
Trans. Amer. Math. Soc. 353 (2001) (10), 3995-4024. 

J. Garibaldi, Erdős Distance Problem for Convex Metrics, UCLA Ph.D. Thesis. 
D. Goldstein, R. Guralnick and I. Isaacs, Inequalities for finite group permutation 
modules, Trans. Amer. Math. Soc. 357 (2005), 4017-4042. 

D. Goldston and C.Y. Yildirim, Higher correlations of divisor sums related to 
primes, I: Triple correlations, Integers 3 (2003) A5, 66pp. 

D. Goldston and C.Y. Yildirim, Higher correlations of divisor sums related to 
primes, II: k-correlations, preprint (available at AIM preprints) 

D. Goldston and C.Y. Yildirim, Small gaps between primes, I, preprint. 

I. Good, Short proof of a conjecture by Dyson, J. Mathematical Phys. 11 (1970), 
1884. 

T. Gowers, Lower bounds of tower type for Szemerédi’s uniformity lemma, GAFA 
7 (1997), 322-337. 

T. Gowers, A new proof of Szemerédi’s theorem for arithmetic progressions of 
length four, GAFA 8 (1998), 529-551. 

T. Gowers, A new proof of Szemerédi’s theorem, GAFA 11 (2001), 465-588. 

T. Gowers, Quasirandomness, counting, and regularity for 3-uniform hypergraphs, 
Comb. Probab. Comput. 15 (1-2). (2006), pp. 143-184. 

T. Gowers, Hypergraph regularity and the multidimensional Szemerédi theorem, 
preprint. 

R. Graham, Complete sequences of polynomial values, Duke Math. J. 31 (1964), 
275-286. 

R. Graham, V. Rödl and A. Rucinski, On Schur properties of random subsets of 
integers, J. Numb. Theory 61 (1996), 388-408. 

R. Graham, B. Rothschild and J.H. Spencer, Ramsey Theory, Wiley, (1980). 

B. Green, Edinburgh lecture notes on Freiman’s theorem, unpublished. 

B. Green, The number of squares and B,[g] sets, Acta Arith. 100 (2001) (4), 
365-390. 

B. Green, Some constructions in the inverse spectral theory of cyclic groups, Comb. 
Prob. Comp. 12 (2003) (2), 127-138. 

B. Green, Roth’s theorem in the primes, Annals of Math 161 (2005) (3), 1609-1636. 
B. Green, On arithmetic structures in dense sets of integers, Duke Math. Jour. 114 
(2002) (2), 215-238. 

B. Green, Arithmetic progressions in sumsets, GAFA 12 (2002), 584-597. 

B. Green, A Szemerédi-type regularity lemma in abelian groups, GAFA 15 (2005) 
(2), 340-376. 

B. Green, Finite field models in arithmetic combinatorics. In Surveys in Combina- 
torics 2005, B.S. Webb (ed), Cambridge University Press, (1995), 1-27. 

B. Green, The polynomial Freiman—Ruzsa conjecture, unpublished. 

B. Green, Long arithmetic progressions of primes, preprint. 

B. Green and I. Ruzsa, Sets with small sumsets and rectification, Bull. London 
Math. Soc. 38 (2006) (1), 43-52. 

B. Green and I. Ruzsa, Counting sumsets and sum-free sets modulo a prime, Studia 
Sci. Math. Hungar. 41 (2004) (3), 285-293. 


[156] 


[157] 
[158] 


[159] 
[160] 
[161] 
[162] 
[163] 


[164] 


[165] 


[166] 
[167] 


[168] 
[169] 


[170] 
[171] 
[172] 


[173] 
[174] 


[175] 
[176] 
[177] 
[178] 
[179] 


[180] 


[181] 


[182] 


Bibliography 495 


B. Green and I. Ruzsa, Counting sum-free sets in abelian groups, Israel J. Math 
147 (2005), 157-189. 

B. Green and I. Ruzsa, Freiman’s theorem in an arbitrary abelian group, preprint. 
B. Green and T. Tao, The primes contain arbitrarily long arithmetic progressions, 
Annals of Math., to appear. 

B. Green and T. Tao, Restriction theory of the Selberg sieve, and applications, 
preprint. 

B. Green and T. Tao, An inverse theorem for the Gowers U3(G) norm, preprint. 
B. Green and T. Tao, Finite field analogues of Szemerédi’s theorem, preprint. 

B. Green and T. Tao, Compressions, convex geometry, and the Freiman—Bilu the- 
orem, preprint. 

J.R. Griggs, On the distribution of sums of residues, Bull. Amer. Math. Soc. 28 
(1993), 329-333. 

J.R. Griggs, Database security and the distribution of subset sums in R”. In Graph 
theory and Combinatorial Biology (Balatonlelle, 1996), Bolyai Soc. Math. Stud., 
7, János Bolyai Math. Soc., (1999), 223-252. 

J. Gunson, Proof of a conjecture of Dyson in the statistical theory of energy levels, 
J. Math. Phys., 3 (1962), 752-753. 

R.K. Guy, Unsolved Problems in Number Theory, Springer-Verlag, (1994). 

G. Halász, Estimates for the concentration function of combinatorial number theory 
and probability, Period. Math. Hungar. 8 (1977) (3-4), 197-211. 

H. Halberstam and K. Roth, Sequences, Springer-Verlag, (1966). 

A.W. Hales and R.I. Jewett, Regularity and positional games, Trans. Amer. Math. 
Soc. 106 (1963), 222-229. 

H. Halberstam and H.E. Richert, Sieve Methods, Academic Press, (1974). 

P. Hall, On representatives of subsets, J. London Math. Soc. 10 (1935), 26-30. 
Y. Hamidoune, A.S. Lladó and O. Serra, On subsets with small product in torsion- 
free groups, Combinatorica 18 (1998), 529-540. 

Y. Hamidoune, Private communication. 

Y. Hamidoune and Ø. Rødseth, An inverse theorem mod p, Acta Arith. 92 (2000) 
(3), 251-262. 

Y. Hamidoune and G. Zemor, On zero-free subset sums, Acta Arith. 78 (1996), 
(2), 143-152. 

H. Harborth, Ein Extremalproblem fiir Gitterpunkte, J. Reine Angew. Math. 
262/263 (1973), 356-360. 

D.R. Heath-Brown, Integer sets containing no arithmetic progressions, J. London 
Math. Soc. 35 (1987), 385-394. 

D.R. Heath-Brown, The number of primes in a short interval, J. Reine Angew. Math. 
389 (1988), 22-63. 

D.R. Heath-Brown, Three primes and an almost prime in arithmetic progression, 
J. London Math. Soc. (2) 23 (1981), 396-414. 

D.R. Heath-Brown and S. Konyagin, New bounds for Gauss sums derived from 
kth powers, and for Heilbronn’s exponential sum, Quart. J. Math. 51 (2000), 
221-235. 

N. Hegyvari, On the representation of integers as sums of distinct terms from a 
fixed set, Acta Arith., 92 (2000) (2), 99-104. 

H. Helfgott, Growth and generation in SL2(Z/pZ), preprint. 


496 


[183] 
[184] 
[185] 


[186] 
[187] 


[188] 
[189] 
[190] 
[191] 
[192] 
[193] 
[194] 
[195] 
[196] 


[197] 


[198] 
[199] 


[200] 
[201] 


[202] 
[203] 


[204] 
[205] 
[206] 


[207] 


Bibliography 


G. Hoheisel, Primzahlprobleme in der Analysis, Sitz. Preuss. Akad. Wiss. 2 (1930), 
1-13. 

B. Host, Progressions arithmétiques dans les nombres premiers (d’aprés B. Green 
et T. Tao), Séminaire Bourbaki, 57éme année, 2004-2005 (944). 

B. Host and B. Kra, Nonconventional ergodic averages and nilmanifolds, Annals 
of Math. 161 (2005), 397-488. 

Q. Hou and Z. Sun, Restricted sums in a field, Acta Arith. 102 (2002) (3), 239-249. 
M. Huxley, On the difference between consecutive primes, Invent. Math. 15 (1972), 
164-170. 

A. Ingham, On the difference between consecutive primes, Quart. J. Math. Oxford. 
8 (1937), 255-266. 

A. Iosevich, Curvature, combinatorics, and the Fourier transform, Not. Amer. Math. 
Soc. 48 (2001), 577-583. 

S. Janson, Poisson approximation for large deviations, Random Structures Algo- 
rithms 1 (1990) (2), 221-229. 

X. Jia, Ba[g] sequences with large upper density, J. Number Theory 56 (1996), 
298-308. 

X. Jia, On finite Sidon sequences, J. Number Theory 44 (1993) (1), 84-92. 

R. Jin, Freiman’s conjecture and nonstandard methods, preprint. 

F. John, Extremum problems with inequalities as subsidiary conditions. In Studies 
and Essays presented to R. Courant on his 60th birthday, Interscience Publishers 
Inc., (1948), 187-204. 

J. Kahn, J. Komlós and E. Szemerédi, On the probability that a random +1 matrix 
is singular, J. Amer. Math. Soc. 8 (1995), 223-240. 

G. Katona, A simple proof of the Erdés—Chao Ko—Rado theorem, J. Combin. Thy. 
Ser. B 13 (1972), 183-184. 

N. Katz and G. Tardos, A new entropy inequality for the Erdős distance problem. In 
Towards a Theory of Geometric Graphs, J. Pach (ed), Contemporary Mathematics 
342, American Mathematical Society, (2004), 119-126. 

N. Katz and T. Tao, Bounds on arithmetic progressions, and applications to the 
Kakeya conjecture, Math. Res. Letters 6 (1999), 625-630. 

N. Katz and T. Tao, Some connections between the Falconer and Furstenburg 
conjectures, New York J. Math. 7 (2001), 148-187. 

A. Kemnitz, On a lattice point problem,Ars. Combin. 16 (1983), 151-160. 

J. Kemperman, On small sumsets in an abelian group, Acta Math. 103 (1960), 
63-88. 

J. Kemperman, On complexes in a semigroup, Indag. Math. 18 (1956), 247-254. 

J.H. Kim and V.H. Vu, Concentration of multivariate polynomials and its applica- 
tions, Combinatorica 20 (2000) (3), 417—434. 

J.H. Kim and V.H. Vu, Small complete arcs in projective planes, Combinatorica 
23 (2003) (2), 311-363. 

B. Klartag and V. Milman, Isomorphic Steiner symmetrization, Invent. Math. 153 
(2003), 463-485. 

D.J. Kleitman, On a lemma of Littlewood and Offord on the distributions of certain 
sums, Math. Z. 90 (1965), 251-259. 

D.J. Kleitman, On a lemma of Littlewood and Offord on the distributions of linear 
combinations of vectors, Adv. in Math. 5 (1970), 155-157. 


[208] 


[209] 
[210] 
[211] 
[212] 
[213] 
[214] 
[215] 
[216] 
[217] 
[218] 


[219] 
[220] 


[221] 
[222] 


[223] 


[224] 
[225] 
[226] 
[227] 
[228] 
[229] 


[230] 
[231] 


Bibliography 497 


J. Komlós and M. Simonovits, Szemerédi’s regularity lemma and its applications 
in graph theory. In Combinatorics, Paul Erdős is Eighty, Vol. 2 (Keszthely, 1993), 
Bolyai Soc. Math. Stud., 2, János Bolyai Math. Soc., (1996), 295-352. 

S. Konyagin and I. Laba, Distance sets of well-distributed planar sets for polygonal 
norms, preprint. 

A. Kostochka and B. Sudakov, On Ramsey numbers of sparse graphs, Combina- 
torics, Probability and Computing 12 (2003), 627—641. 

M. Kneser, Abschätzungen der asymptotischen Dichte von Summenmengen, Math. 
Z 58 (1953), 459—484. 

Y. Kohayakawa, T. Luczak and V. Rödl, Arithmetic progressions of length three in 
subsets of a random set, Acta Arith. 75 (1996) (2), 133—163. 

M.N. Kolountzakis, The density of B [g] sequences and the minimum of dense 
cosine sums, J. Number Theory 56 (1996), 4-11. 

M.N. Kolountzakis, On the additive complements of the primes and sets of similar 
growth, Acta Arith. 77 (1996), 1-8. 

J. Komlós, On the determinant of (0, 1) matrices, Studia Sci. Math. Hungar. 2 
(1967), 7-22. 

J. Komlós, On the determinant of random matrices, Studia Sci. Math. Hungar., 3 
(1968), 387-399. 

B. Kra, The Green—Tao Theorem on arithmetic progressions in the primes: an 
ergodic point of view, Bull. Amer. Math. Soc. 43 (2006), 3-23. 

M. Krivelevich, S. Litsyn and A. Vardy, A lower bound on the density of sphere 
packings via graph theory, Int. Math. Res. Not. 43 (2004), 2271-2279. 

M. Krivelevich and B. Sudakov, Pseudo-random graphs, preprint. 

I. Laba, Fuglede’s conjecture for a union of two intervals, Proc. Amer. Math. Soc. 
129 (2001), 2965-2972. 

I. Laba and M. Lacey, On sets of integers not containing long arithmetic progres- 
sions, unpublished. 

T. Lam, Graphs without cycles of even length, Bull. Austral. Math. Soc. 63 (2001) 
(3), 435-440. 

M. Laczkovich and I. Ruzsa, The number of homothetic subsets. In The Mathe- 
matics of Paul Erdős, R. Graham and J. NeSetiil (eds), Springer-Verlag, (1996), 
294-302. 

T. Leighton, Complexity Issues in VLSI, Foundations of Computing Series, MIT 
Press, (1983). 

V. Lev, Structure theorem for multiple addition and the Frobenius problem, J. 
Number Theory 58 (1996) (1), 79-88. 

V. Lev, Optimal representations by sumsets and subset sums, Journal of Number 
Theory 62 (1997) (1) 127-143. 

V. Lev, On small sumsets in abelian groups. In Structure Theory of Set Addition, 
Asterisque 258 (1999), 317-321. 

V. Lev, Restricted set addition in groups. I. The classical setting, J. London Math. 
Soc. (Series 2) 62 (2000) (1), 27—40. 

V. Lev, Restricted set addition in groups. IJ. A generalization of the Erdés— 
Heilbronn conjecture., Electron. J. Combin. 7 (2000) (1), Research Paper 4, 10 pp. 
V. Lev, Restricted set addition in abelian groups: results and conjectures, preprint. 
V. Lev, Critical pairs in abelian groups and Kemperman’s theorem, preprint. 


498 


[232] 
[233] 
[234] 
[235] 


[236] 


[237] 
[238] 
[239] 
[240] 
[241] 
[242] 
[243] 
[244] 
[245] 
[246] 
[247] 
[248] 
[249] 
[250] 
[251] 
[252] 
[253] 
[254] 


[255] 


Bibliography 


V. Lev and S. Konyagin, Combinatorics and linear algebra of Freiman’s isomor- 
phism, Mathematika 47 (2000), 39-51. 

V. Lev and P. Smeliansky, On addition of two distinct sets of integers, Acta Arith. 
70 (1995) (1), 85-91. 

J. Liu and Z. Sun, Sums of subsets with polynomial restrictions,J. Number Theory 
97 (2002) (2), 301-304. 

E. Lipkin, Subset sums of sets of residues, Structure theory of set addition, Aster- 
isque 258 (1999), 187-193. 

A. Leibman, Host-Kra and Ziegler factors, and convergence of multiple averages. 
In Handbook of Dynamical Systems, vol. 1B, B. Hasselblatt and A. Katok (eds), 
Elsevier (2005), 745-841. 

J. Littlewood and C. Offord, On the number of real roots of a random algebraic 
equation III, Mat. Sbornik 12 (1943), 277-285. 

L. Lovász, Combinatorial Problems and Exercises, Second edition. North-Holland 
Publishing Co., (1993). 

L. Lovász and B. Szegedy, Szemerédi’s regularity lemma for the analyst, preprint. 
D. Lubell, A short proof of Sperner’s lemma, J. Comb. Theory (1966), 1-299. 

T. Luczak and T. Schoen, On the maximal density of sum-free sets, Acta Arith. 95 
(2000) (3), 225-229. 

A. Macbeath, On the measure of sum sets II. The sum-theorem for the torus, Proc. 
Cambridge Philos. Soc. 49 (1953), 40-43. 

H.B. Mann, A proof of the fundamental theorem on the density of sums of sets of 
positive integers, Annals of Math. 43 (1942), 523-527. 

H.B. Mann, Addition Theorems: The Addition Theorems of Group Theory and 
Number Theory, Interscience, (1965). 

J. MatouSek, Lectures on Discrete Geometry, Graduate Texts in Mathematics, 212, 
Springer-Verlag, (2002). 

L. Meshalkin. Generalization of Sperner’s theorem on the number of subsets of a 
finite set, Teor. Veroyatn. Primen 8 (1963), 219-220. 

R. Meshulam, An uncertainty inequality and zero subsums, Discrete Math. 84 
(1990), 197-200. 

R. Meshulam, On subsets of finite abelian groups with no 3-term arithmetic pro- 
gressions, J. Combin. Theory Ser. A. 71 (1995), 168-172. 

R. Meshulam, An uncertainty principle for finite abelian groups, preprint. 
math.CO/0312407 

V. Milman, Entropy and asymptotic geometry of non-symmetric convex bodies, 
Adv. in Math. 152 (2000), 314-335. 

G. Mockenhaupt and T. Tao, Kakeya and restriction phenomena for finite fields, 
Duke Math. J. 121 (2004), 35-74 

L. Moser, On the different distances determined by n points, Amer. Math. Monthly 
59 (1952), 85-91. 

L. Moser, Notes on number theory II. On a theorem of van der Waerden, Canad. 
Math. Bull. 3 (1960), 23-25. 

B. Nagle, V. Rödl and M. Schacht, The counting lemma for regular k-uniform 
hypergraphs, Random Structures and Algorithms 26 (2006) (2), 1-67. 

M. Nathanson, Sums of finite sets of integers, Amer. Math. Monthly, 79 (1972), 
1010-1012. 


[256] 
[257] 
[258] 


[259] 


[260] 
[261] 
[262] 
[263] 
[264] 
[265] 
[266] 
[267] 
[268] 
[269] 
[270] 
[271] 
[272] 
[273] 
[274] 


[275] 


[276] 


[277] 


Bibliography 499 


M. Nathanson, An inverse theorem for sums of sets of lattice points, J. Number 
Theory, 46 (1994), 29-59. 

M. Nathanson, Additive Number Theory. Inverse Problems and the Geometry of 
Sumsets, Graduate Texts in Mathematics 165, Springer-Verlag, (1996). 

M. Nathanson, On sums and products of integers, Proc. Am. Math. Soc. 125 (1997), 
9-16. 

M. Nathanson, Waring’s problem for sets of density zero. In Analytic Number The- 
ory, M. Knopp (ed), Lecture Notes in Mathematics 899, Springer-Verlag, (1980), 
302-310. 

M. Nathanson, Growth of sumsets in abelian semigroups, Semigroup Forum 61 
(2000), 149-153. 

M. Nathanson and I. Ruzsa, Polynomial growth of sumsets in abelian semigroups, 
J. Théor. Nombres Bordeaux 14 (2002) (2), 553-560. 

M. Nathanson and G. Tenenbaum, Inverse theorems and the number of sums and 
products. In Structure Theory of Set Addition, Asterisque 258 (1999), 195-204. 
M. Newman, On a theorem of Cebotarev, Linear and Multilinear Algebra 3 
(1975/76) (4), 259-262. 

K. O’Bryant, A complete annotated bibliography of work related to Sidon 
sequences, Electronic Journal of Combinatorics DS11, 39 pages, July 2004. 

J.E. Olson, An addition theorem modulo p, J. Combinatorial Theory 5 (1968), 
45-52. 

J. Olson, A combinatorial problem on finite Abelian groups. I, J. Number Theory 
1 (1969), 8-10. 

J. Olson, A combinatorial problem on finite Abelian groups. II, J. Number Theory 
1 (1969), 195-199. 

J.E. Olson, Sums of sets of group elements, Acta Arithmetica 28 (1975), 147- 
156. 

J. Pach, Crossing numbers. In Discrete and Computational Geometry (Tokyo, 
1998), Lecture Notes in Comput. Sci. 1763, Springer-Verlag, (2000), 267-273. 

J. Pach and G. Tardos, Isosceles triangles determined by a planar point set, Graphs 
and Combinatorics 18 (2002), 769-779. 

J. Pach and G. Tóth, Graphs drawn with few crossings per edge, Combinatorica 
17 (1997), 427-439. 

A. Plagne, A new upper bound for B2[2] sets, J. Combin. Theory Ser. A 93 (2001) 
(2), 378-384. 

H. Pliinnecke, Eigenschaften und Abschdtzungen von Wirkingsfunktionen, BMwF- 
GMD-22 Gesellschaft fiir Mathematik und Datenverarbeitung, Bonn 1969. 

C. Pomerance and A. Sark6zy, Combinatorial number theory. In Handbook of 
Combinatorics Vol. 1, Elsevier, (1995), 967—1018. 

T. Przebinda, Three uncertainty principles for a locally compact abelian group. In 
Representations of Real and p-adic Groups, Lect. Notes Ser. Inst. Math. Sci., Nat. 
Univ. Singap., 2, Singapore University Press, (2004), 1-18. 

FP. Ramsey, On a problem of formal logic, Proc. London Math. Soc. 30 (1930), 
264-285. 

R.A. Rankin, Sets of integers containing not more than a given number of terms 
in arithmetic progression, Proc. Roy. Soc. Edinburgh Sect. A 65 (1960/1961), 
332-344. 





500 


[278] 
[279] 
[280] 
[281] 
[282] 
[283] 
[284] 
[285] 
[286] 
[287] 
[288] 


[289] 


[290] 
[291] 


[292] 
[293] 
[294] 
[295] 
[296] 
[297] 
[298] 
[299] 
[300] 
[301] 


[302] 


Bibliography 


Yu. G. Rešetnyak, New proof of a theorem of N.G. Cebotarev (in Russian), Uspehi 
Mat. Nauk (N.S.) 10 (1955) (3) (65), 155-157. 

C. Reiher, Kemnitz’s conjecture concerning lattice points in the plane, preprint. 
A. Robertson and D. Zeilberger, A 2-coloring of [1, N] can have (1/22)N 2 + O(N) 
monochromatic Schur triples, but not less!, Electronic Journal of Combinatorics, 
5 (1998), R19. 

V. Rödl, B. Nagle, J. Skokan, M. Schacht and Y. Kohayakawa, The hypergraph 
regularity method and its applications, Proc. Nat. Acad. Sci. 102 (2005), 8109- 
8113. 

V. Rödl and M. Schacht, Regular partitions of hypergraphs, preprint. 

V. Rödl and J. Skokan, Regularity lemma for k-uniform hypergraphs, Random 
Structures Algorithms 25 (2004) (1), 1-42. 

V. Rödl and J. Skokan, Applications of the regularity lemma for uniform hyper- 
graphs, Random Structures and Algorithms 28 (2006), 180-194. 

C. Rogers and G. Shephard, The difference body of a convex body, Arch. Math. 8 
(1957), 220-233. 

L. Rónyai, On a conjecture of Kemnitz, Combinatorica 20 (2000), 569-573. 
K.F. Roth, On certain sets of integers, J. London Math. Soc. 28 (1953), 245-252. 
K.F. Roth, Irregularities of sequences relative to arithemtic progressions, IV. Period. 
Math. Hungar. 2 (1972), 301-326. 

I. Ruzsa, On the cardinality of A + A and A — A. In Combinatorics (Keszthely, 
1976), Coll. Math. Soc. Bolyai 18, Akadémiai Kaid6 (1979), 933-938. 

I. Ruzsa, Arithmetic progressions in sumsets, Acta Arith. 60 (1991), 191-202. 

I. Ruzsa, On the number of sums and differences, Acta Math. Hung. 59 (1992), 
439-447. 

I. Ruzsa, A concavity property for the measure of product sets in groups, Fund. 
Math. 140 (1992) (3), 247-254. 

I. Ruzsa, On the additive completion of primes, Acta Arith. 86 (1998) (3), 269- 
275. 

I. Ruzsa, Solving a linear equation in a set of integers, I, Acta Arith. 65 (1993), 
259-2872. 

I. Ruzsa, Generalized arithmetical progressions and sumsets, Acta Math. Hungar. 
65 (1994) (4), 379-388. 

I. Ruzsa, Sum of sets in several dimensions, Combinatorica 14 (1994), 485-490. 
I. Ruzsa, Sums of finite sets. In Number Theory: New York Seminar, D.V. Chud- 
novsky, G.V. Chudnovsky and M.B. Nathanson (eds), Springer-Verlag, (1996), 
281-293. 

I. Ruzsa, An infinite Sidon sequence, J. Number Theory 68 (1998) (1), 63-71. 

I. Ruzsa,A small maximal Sidon set, Paul Erdős (1913-1996), Ramanujan J. 2 
(1998) (1-2), 55-58. 

I. Ruzsa, An analog of Freiman’s theorem in groups. In Structure Theory of Set 
Addition, Astérisque 258 (1999), 323-326. 

I. Ruzsa, An almost polynomial Sidon sequence, Studia Sci. Math. Hungar. 38 
(2001), 367-375. 

I. Ruzsa, A problem on restricted sumsets. In Towards a Theory of Geomet- 
ric Graphs, J. Pach (ed), Contemp. Math., 342, American Mathematical Society 
(2004), 245-246. 


[303] 
[304] 


[305] 
[306] 
[307] 
[308] 
[309] 
[310] 


[311] 
[312] 


[313] 


[314] 
[315] 


[316] 
[317] 
[318] 


[319] 
[320] 


[321] 
[322] 


[323] 


[324] 
[325] 
[326] 
[327] 


[328] 


Bibliography 501 


I. Ruzsa, Sum-avoiding sumsets, preprint. 

I. Ruzsa and E. Szemerédi, Triple systems with no six points carrying three trian- 
gles, Collog. Math. Soc. J. Bolyai 18 (1978), 939-945. 

R. Salem and D.C. Spencer, On sets of integers which contain no three terms in 
arithmetic progression, Proc. Nat. Acad. Sci. 32 (1942), 561-563. 

L. Santaló, Un invariante afin para los cuerpos convexos del espacio de n dimen- 
siones, Portugalie Math. 8 (1949), 155-161. 

A. Sark6zy, Finite addition theorems I, J. Number Theory, 32 (1989), 114-130. 
A. Sark6zy, Finite addition theorems. II, J. Number Theory 48 (1994) (2), 197-218. 
A. Sarkézy, On finite addition theorems. In Structure Theory of Set Addition, Aster- 
isque 258 (1999), xi-xii, 109-127. 

A. Sarkézy and E. Szemerédi, Uber ein Problem von Erdés und Moser, Acta Arith. 
11 (1965), 205-208. 

P. Scherk, An inequality for sets of integers, Pacific J. Math. 5 (1955), 585-587. 
L. Schnirelmann, Uber additive Eigenschaften von Zahlen, Annals Inst. Polyt. 
Novocherkassk 14 (1930) 3-28; Math. Annalen 107 (1933) 694-690. 

T. Schoen, The number of monochromatic Schur triples, European J. Combin. 20 
(1999), 855-866. 

I.D. Shkredov, On a problem of Gowers, preprint. 

I. Schur, Uber die Kongruenz x” + y” = z” (modp), Jber. Deutsch. Math.-Verein. 
25 (1916), 114-116. 

J. Shearer, A note on the independence number of triangle-free graphs, Discrete 
Mathematics, 46 (1983), 83-87. 

J. Shearer, A note on the independence number of triangle-free graphs II, J. Combin. 
Theory Ser. B. 53 (1991), 300-307. 

S. Shelah, Primitive recursive bounds for van der Waerden numbers, J. Amer. Math. 
Soc. 1 (1988), 683-697. 

S. Sidon, On B2-sequences, Math. Annalen 106, (1932), 536-539. 

J. Singer, A theorem in finite projective geometry and some applications to number 
theory, Trans. Amer. Math. Soc. 43 (1938), 377-385. 

K.T. Smith, The uncertainty principle on groups, SIAM J. Appl. Math. 50 (1990), 
876-882. 

H. Snevily, The Cayley Addition Table of Z,,, Amer. Math. Monthly 106 (1999), 
584-585. 

J. Solymosi, Note on a generalization of Roth’s theorem. In Discrete and 
Computational Geometry, Algorithms Combin. 25, Springer-Verlag, (2003), 825- 
827. 

J. Solymosi, On the number of sums and products, Bull. London Math. Soc. 37 
(2005) (4), 491-494. 

J. Solymosi, On sumsets and product sets of complex numbers, preprint. 

J. Solymosi and V. Vu, Distinct distances in high dimensional homogeneous sets. 
In Towards a Theory of Geometric Graphs, Contemp. Math., 342, American Math- 
ematical Society (2004), 259-268. 

J. Solymosi and V. Vu, Near optimal bound for the distinct distances problem in 
high dimensions, Combinatorica, to appear. 

J. Solymosi and C.D. Tóth, Distinct distances in the plane, Discrete Comput. Geom. 
25 (2001) (4), 629-634. 


502 


[329] 
[330] 
[331] 
[332] 
[333] 
[334] 
[335] 
[336] 
[337] 
[338] 
[339] 
[340] 
[341] 
[342] 
[343] 
[344] 
[345] 


[346] 


[347] 
[348] 
[349] 
[350] 


[351] 


Bibliography 


J. Solymosi, G. Tardos and C.D. Tóth, The k most frequent distances in the plane, 
Discrete Comput. Geom. 28 (2002) (4), 639-648. 

R. Stanley, Weyl groups, the hard Lefschetz problem, and the Sperner property, 
SIAM J. Alg. Disc. Math. 1 (1980), 168-184. 

J. Spencer, Four squares with few squares. In Number Theory, New York Seminar 
1991—1995, D.V. Chudnovsky et al. (eds), Springer-Verlag, 295-297. 

E. Sperner, Ein Satz tiber Untermengen einer endlichen Menge, Math. Z. 27 (1928), 
544-548. 

Y. Stanchescu, On addition of two distinct sets of integers, Acta Arith. 75 (1996) 
(2), 191-194. 

Y. Stanchescu, On the structure of sets with small doubling property on the plane, 
Acta Arith. 83, 1998, 127-141. 

Y. Stanchescu, On finite difference sets, Acta Math. Hungar. 79 (1998), 123-138. 
J. Steinig, On Freiman’s theorems concerning the sum of two finite sets of integers. 
In Preprints of the Conference on the Structural Theory of Set Addition, CIRM, 
Marseille (1993), 173-186. 

S.A. Stepanov, The number of points on a hyperelliptic curve over a prime field, 
Izv. Akad. Nauk SSSR Ser. Mater. 33 (1969), 1171-1181. 

P. Stevenhagen and H.W. Lenstra Jr., Chebotarév and his density theorem, Math. 
Intelligencer 18 (1996) (2), 26-37. 

A. Stöhr, Gelöste und ungeloste Fragen über Basen der natürlichen Zahlenreihe. I, 
II, J. Reine Angew. Math. 194 (1955), 40-65; 111-140. 

B. Sudakov, E. Szemerédi and V. Vu, On a question of Erdős and Moser, Duke 
Math. J. 129 (2005) (1), 129-155. 

Z.W. Sun, Unification of zero-sum problems, subset sums and covers of Z, Electron. 
Res. Announc. Amer. Math. Soc. , 9 (2003), 51—60. 

L. Székely, Crossing numbers and hard Erdős problems in discrete geometry, Com- 
bin. Probab. Comput. 6 (1997), 353-358. 

E. Szemerédi, On sets of integers containing no four elements in arithmetic pro- 
gression, Acta Math. Acad. Sci. Hungar. 20 (1969), 89-104. 

E. Szemerédi, Integer sets containing no arithmetic progressions. Acta Math. Hun- 
gar. 56 (1990) (1-2), 155-158. 

E. Szemerédi, On sets of integers containing no k elements in arithmetic progres- 
sion, Acta Arith. 27 (1975), 299-345. 

E. Szemerédi, Regular partitions of graphs. In Problemés Combinatoires et Théorie 
des Graphes, Proc. Colloque Inter. CNRS, Bermond, Fournier, M. Las Vergnas and 
Sotteau (eds), CNRS Paris (1978), 399-401. 

E. Szemerédi, On a conjecture of Erdős and Heilbronn, Acta Arith. 17 (1970) 
227-229. 

E. Szemerédi and W. T. Trotter Jr., Extremal problems in discrete geometry, Com- 
binatorica 3 (1983), 381-392. 

E. Szemerédi and V. Vu, Long arithmetic progressions in sum sets and the number 
of x-sum-free sets, Proc. London Math. Soc. 90 (2005) (2), 273-296. 

E. Szemerédi and V. Vu, Long arithmetic progressions in sumsets: thresholds and 
bounds, J. Amer. Math. Soc. 19 (2006) (1), 119-169. 

E. Szemerédi and V. Vu, Finite and infinite arithmetic progressions in sumsets, 
Annals of Math. (2) 163 (2006) (1), 1-35. 


[352] 
[353] 


[354] 
[355] 
[356] 
[357] 
[358] 
[359] 
[360] 
[361] 
[362] 
[363] 
[364] 
[365] 
[366] 
[367] 
[368] 
[369] 
[370] 
[371] 
[372] 
[373] 
[374] 
[375] 
[376] 
[377] 
[378] 
[379] 


[380] 
[381] 


Bibliography 503 


E. Szemerédi and V. Vu, Olson theorem revisited, preprint. 

G. Tardos, On distinct sums and distinct distances, Adv. in Math. 180 (2003) (1), 
275-289. 

T. Tao, Finite field analogues of the Erdős, Falconer, and Furstenburg problems, 
unpublished. 

T. Tao, An uncertainty principle for cyclic groups of prime order, Math. Res. Lett. 
12 (2005) (1), 121-127. 

T. Tao, Recent progress on the Restriction conjecture, Park City lecture notes. 

T. Tao, A quantitative ergodic theory proof of Szemerédi’s theorem, preprint. 

T. Tao, Arithmetic progressions and the primes, El Escorial lecture notes. 

T. Tao, Szemerédi’s regularity lemma revisited, preprint. 

T. Tao, A variant of the hypergraph removal lemma, preprint. 

T. Tao, Obstructions to uniformity, and arithmetic patterns in the primes, 
preprint. 

T. Tao, Product set estimates for non-commutative groups, preprint. 

T. Tao, A remark on Goldston—Yildirmm correlation estimates, unpublished. 

T. Tao and V. Vu, On random +1 matrices: Singularity and Determinant, Random 
Structures Algorithms 28 (2006) (1), 1-23. 

T. Tao and V. Vu, On the singularity probability of random Bernoulli matrices, J. 
Amer. Math. Soc., to appear. 

T. Tao and V. Vu, Inverse Littlewood—Offord theorems, and the least singular value 
of random Bernoulli matrices, preprint. 

T. Tao and V. Vu, Littlewood—Offord problem in high dimensions, preprint. 

C. Tóth, The Szemeredi—Trotter theorem in the complex plane, preprint. 

P. Turan, On a theorem of Hardy and Ramanujan, J. London Math. Soc. 9 (1934), 
274-276. 

J.G. van der Corput, Uber Summen von Primzahlen und Primzahlquadraten, Math. 
Ann. 116 (1939), 1-50. 

B.L. van der Waerden, Beweis einer Baudetschen Vermutung, Nieuw. Arch. Wisk. 
15 (1927), 212-216. 

P. Varnavides, On certain sets of positive density, J. London Math. Soc. 39 (1959), 
358-360. 

R.C. Vaughan, The Hardy-Littlewood Method, Second edition, Cambridge Tracts 
in Mathematics 125, Cambridge University Press 1997. 

T. Voight and G. Ziegler, Singular 0/1 matrices and the hyperplanes spanned by 
random 0/1 vectors, preprint. 

A.G. Vosper, The critical pairs of subsets of a group of prime order, J. London 
Math. Soc. 31 (1956), 200-205. 

V. Vu, High order complementary bases of primes, Integers 2 (2002), paper no. 
A12. 

V. Vu, Concentration of non-Lipschitz functions and applications, Random Struc- 
tures Algorithms 20 (2002) (3), 262-316. 

V. Vu, On the concentration of multivariate polynomials with small expectation, 
Random Structures Algorithms 16 (2000) (4), 344-363. 

V. Vu, On a refinement of Waring’s problem,Duke Math. J. 105 (2000) (1), 107-134. 
V. Vu, Some results on subset sums, preprint. 

V. Vu, On a question of Gowers, Ann. Comb. 6 (2002) (2), 229-233. 


504 


[382] 
[383] 
[384] 
[385] 


[386] 
[387] 


[388] 


Bibliography 


T. Wooley, On Vu’s thin basis theorem in Waring’s problem, Duke Math. J., 120 
(2003) (1), 1-34. 

K. Wilson, Proof of a conjecture by Dyson, J. Math. Phys., 3 (1962), 1040-1043. 
E. Wirsing, Thin subbases, Analysis 6 (1986), 285-308. 

K. Yamamoto, Logarithmic order of free distributive lattice, J. Math. Soc. Japan 6 
(1954), 343-353. 

T. Ziegler, Universal characteristic factors and Furstenberg averages, preprint. 

J. Zöllner, Der Vier-Quadrate-Satz und ein Problem von Erdős und Nathanson, 
Ph.D thesis, Johannes Gutenberg-Universitat, Mainz (1984). 

J. Zöllner, Uber eine Vermutung von Choi, Erdés und Nathanson, Acta Arith. 45 
(1985), 211-213. 


Index 





a-pseudo-random, 161 

a-spectrum, 181 

a-uniform, 161, 164 

b0), 420 

Bp set, 57; see also Sidon set 

By, set, 172, 226, 253 

Bn[g] set, 35, 36, 37,172 

By set, 237 

Cov, 156 

d-fold powers, xvii 
product set, xvii 

e-regularity, 406 

(1 — €)-complementary base of order k, 21 

EV F,1 

EAF,1 

E(X), 1 

E,1 

E(X, F), 2 

E ;-monochromatic, 255 

I(P), xv 

3k — 3 theorem, 207 

K-constant, 456 

(K, 2)-constant, 461 

(K, o)-almost periodic, 399 

k-pseudo-random, 464 

K -quasiperiodic, 399 

k-simplex, 454 

K -approximate group, 57 

k-choosable, 260 

k-dissociated, 294 

k-poor, 319 

k-rich, 319 

LP theory, 156 

L?(Z) norm, 156 

mes(A), 122 

QU F(n)), xvi 


Oncol f (n)), xvi 

OCF (n)), xvi 

2; (X), 88 

Õ;(Y), 88 

On > 00(f (n)), xvi 
O(f(n)), xvi 

p-torsion groups, 378 
P(E), 1 

o-algebra, 401, 441, 448 
Sym, (A), 84 

O(f(n)), xvi 

l4, XV 

Var(X), 1 

vol(P), xvii 

A(p) constants, 149, 172, 227, 318 


Abel summation method, 48, 49 

technique, 47 
Ackermann bound, 414 
Ackermann growth, 259 
additive energy, 57, 61, 63, 83, 84, 107, 149, 157, 

164, 179, 181, 188, 290, 359, 419, 434 
additive geometry, 112, 239 
additive group, xiii, 113 
finite, 57 
additive number theory, 35, 51 
additive set, xii, 51 
random, xiii 

additive structure, xi 
adjoint, 155 

functor, 237 
almost periodic, 192, 399, 441, 448 

of order d — 1, 443 
almost periodic primes, 466 
almost periodic sets, 149 
ambient group, xii, 65, 66, 92 


505 


506 


anti-chain, 277 
anti-product, 279 
apex, 319 
approximate group, 52, 53, 74, 75, 76, 113, 120, 
198, 226 
K-, 74, 77 
multiplicative K -, 93 
arithmetic combinatorics, xi 
arithmetic progression, 120, 143, 471, 474, 
477 
generalized, xiv, 52, 119, 121; see also 
generalized arithmetic progression 
arithmetically structured sets, 371 
unstructured sets, 371 
asymmetric sum set estimates, 74 
asymptotic complementary base of order k, 
21 
at threshold a, 84 
atom, 401 
Azuma’s inequality, 17, 34 


bad pair, 263 
Balog—Szemerédi—Gowers theorem, xiv, 53, 63, 
78, 81, 82, 85, 111, 228, 246, 426, 434, 
440 
asymmetric, 87 
non-commutative, 96 
Banach algebra properties, 444, 445 
base, 480 
of order k, 12, 18 
point, xvii 
basis vectors, xvii 
Beck’s lemma, 325 
theorem, 314 
Behrend’s example, 376 
Bernoulli matrices, 297 
Bernoulli variable, 11 
Bernstein’s inequality, 178 
Bertrand’s postulate, 49, 229, 387 
bias, 149 
Fourier, 160 
linear, 149, 160, 374 
bilinear form, non-degenerate, 150 
symmetric, 150 
bipartite graph, 79, 83, 247, 265, 266 
commutative, 268 
dense, 261 
direct sum, 270 
directed, 267 
Blichtfeld’s lemma, 134 
Bohr neighborhoods, 165 


Index 


Bohr sets, 112, 165, 166, 168, 170, 297, 
378, 392 
regular, 170 
size bounds, 166 
Bonami—Beckner inequality, 179 
Bonferroni inequalities, 5 
boolean, 2 
polynomial, 27 
quadratic, 28 
Borel—Cantelli lemma, 3, 36, 40 
Bourgain’s bound, 416 
theorem, 194, 470 
Brunn’s inequality, 130 
Brunn—Minkowski inequality, 123, 126, 127, 
129, 210, 215 


Cov, 156 
canonical projection, xviii 
capset, 372, 378 
cardinality, xv 
Cartesian product, xv 
Cauchy’s theorem, 118 
Cauchy—Davenport inequality, 200, 205, 206, 
209, 284, 289, 330, 333, 365, 479 
Cauchy—Schwarz inequality for bipartite graphs, 
247 
inequality, 9 
Cayley graph, 247 
cells, 321 
decomposition, 308, 321, 325 
chain, 278 
centered, 278 
connected, 278 
decomposition lemma, 278 
length, 278 
Chang’s theorem for r-torsion groups, 228 
Chang’s covering lemma, 53, 229 
theorem, 189, 430, 470 
character, 149, 152, 154 
characteristic function, 157 
characteristic, 330, 346 
Chebotarev’s lemma, 365 
Chebyshev’s inequality, 6, 9 
theorem, 49 
Chernoff’s inequality, 11, 18, 19, 33, 37, 163, 
176 
Chevalley—Waring method, xiv 
Chevalley—Waring theorem, 329, 347, 355 
circle group, xvi 
circle shift, 451 
clique, 266 


coalescence of progressions, 474 

coalescence of arithmetic progressions, 484 

collinear triples, 313 

colorful (coloring), 26 

Combinatorial Nullstelensatz, 330, 331, 355 

commutative, abstractly, 275 

complementary base, 16 
asymptotic, 21 
basis of order k, 15 

complete additive set, 485 
graph, 254 
sequence, 480 

conditional expectation, 2, 401 

conditional probability, 2 

convex body, 122, 125, 126 
symmetric, 123 

convex set, 122 

convolution, 149, 153 

correlation condition, 464 

coset progression, 167, 168, 240 
proper, 167 
symmetric, 167 

counting function, 12, 37 

covariance, 7 

covering lemmas, 70 

covolume, 115 

crossing number, 308, 309 
inequality, xiv, 309 

cryptography, 109 

cube covering lemma, 181 

cube, d-dimensional, xiv 

cut set, 271 

cycle, 247 

cyclic group, 4, 113, 224, 280, 346, 429, 479 
of order N, xvi 

cyclotomic fields, 330, 362 
polynomial, 362 


Davenport number, 350, 353 

decaying function, xvi 

decoupling inequality, 305 

degree map, 235 
of a vertex, 247 

density increment, 427, 441 

density argument, 375, 379, 398, 414, 425 

density of a non-trivial hyperplane, 301 
of a group, 151 

dependency graph, 24 

Diderrich’s problem, 485 

difference constant, 57, 58, 64, 222 

difference operator, 425 


Index 507 


difference set, xii 
conclude, 79 
partial, 79 
Diffie-Hellman distribution, 188 
dilate, 54 
dilation, xii 
dimension, xvii 
directional basis, 136 
Dirichlet’s theorem, 367 
discrete box, xix, 131 
discrete John theorem, 243 
discrete parabola, 59 
dissociated set, 175, 383 
distinct distances problem, 319 
divide and conquer martingale technique, 35 
divisibility criteria, 329 
dot product, 114 
double counting, 308 
doubling constant, 51, 52, 57, 58, 59, 64, 67, 68, 
75, 112, 122, 170, 198, 217, 222 
drawing, 309 
dual function, 442, 444, 447 
Dyson’s conjecture, 338, 342 


e-transform, 199 
e-transform method, 210 
edges, 247 
bad, 408 
k-coloring, 254 
density, 406 
set, 265, 454 
Eisenstein’s criterion, 330, 363, 367 
elementary prime number estimates, 46 
ellipsoid, 123 
embedding, 233 
energy, 222, 403, 459 
increment, 427, 441, 446 
argument, 375, 398, 414, 415, 425 
strategy, 417 
entropy uncertainty principle, 160 
Erdés’s Littlewood—Offord inequality, 279 
Erdés—Szemerédi conjecture, 315 
Erdés—Turan conjecture, 370, 372, 378, 
463 
constant, 369 
inequality, 165 
Erdés distance problem, 324 
Erdés—Heilbronn conjecture, 336 
generalized, 341 
Erdés—Ko-Rado theorem, 280 
ergodic system, 453 


508 


Esséen concentration inequality, 290, 292 
Euler product, 465 
Euler totient function, 344 
Euler’s constant, 50 
Euler’s formula, 310 
exact inverse sum set theorem, 55 
expectation, 1 

of a complex-valued function, 151 
exponential moment, 10 

method, 10, 17, 30, 35 
exponential map, 152 


finite additive group, fundamental theorem of, 
117 
falling factorial, 336 
filling lemma, 477 
finite fields, 330, 345 
Bertrand’s postulate, 349 
of order q, xvi 
prime number theorem, 349 
Riemann hypothesis, 349 
finitely generated, 114 
first moment method, 2, 7, 109, 224, 248 
FKG inequality, 19, 27 
Fourier analysis, 149, 239 
ergodic theory, 151 
higher-order, 416 
quadratic, 416 
Fourier bias, 163, 172, 196, 470 
Fourier coefficient, 149, 152, 164 
Fourier concentration lemma, 182 
Fourier inversion formula, 153 
Fourier representation, 282 
Fourier transform, 152, 157, 245 
Fourier uniformity, 455 
Freiman 2” theorem, 217 
Freiman cube lemma, 215, 217, 476 
Freiman dimension, 223, 235, 236, 244 
Freiman duality, 238 
Freiman homomorphisms, xiv, 4, 70, 113, 198, 
220, 221, 222, 223, 224, 225, 233, 235, 
236, 237, 238, 471 
of order 2, 155 
Freiman isomorphic, 113, 221, 223, 225, 233, 
234, 235, 237, 238, 250 
of order k, 220 
Freiman isomorphisms, 70, 222, 224 
Freiman rectification principle, 245 
for sum-products, 103 
Freiman theorem, 52, 97, 112, 142, 198, 209 
in an arbitrary group, 240 


Index 


for r-torsion groups, 227 
for torsion-free groups, 230 
Freiman—Ruzsa conjecture, 439 
polynomial, 232 
frequency set, 166 
Frobenius map, 350 
Furstenberg recurrence theorem, 451, 453 
multiple recurrence theorem, 415, 449 
Furstenberg—Weiss example, 429 


Gallai’s theorem, 261 
Gamma function, 126 
Gauss sum estimate, 162, 187 
Gauss’s circle problem, 132 
Gelfand transform, 237 
generalized almost periodic functions, 415 
arithmetic progression, xix, 75, 471, 474 
of rank 2, 52 
generalized Koopman-von Neumann structure 
theorem, 467 
generalized Vandermonde determinants, 364 
generalized von Neumann theorem, 440, 450, 
466 
geometry of numbers, 112 
Goldbach conjecture, 12, 51, 192 
good, 22, 23 
good quadruple, 326 
Gowers uniform, 423, 424, 440 
Gowers inner product, 419 
Gowers theorem, 259 
Gowers triangle inequality, 420, 423 
Gowers uniform functions, 447 
of order k — 1, 450 
of order k — 2, 422, 466 
of order 1, 161 
Gowers uniformity, 425, 429, 441 
estimate, 445, 467 
norm, 416, 417, 418, 442, 450 
of order d, 445 
Gowers—Cauchy—Schwarz inequality, 420, 442 
graph, 247 
bipartite, 247 
complete, 254 
planar, 309 
triangle-free, 251 
graph theory, 246 
Green—Ruzsa’s covering lemma, 53, 71, 228 
Green—Tao theorem, xiv, 370, 398, 448, 463, 
466, 469 
group homomorphism, 113 
isomorphism, 113 


Index 509 


Hahn-Banach theorem, quadratic, 432 
Hales—Jewett theorem, 254, 257 

density, 452 

multidimensional, 260 
Halász inequality, 289 

concentration inequality, 286 

relative concentration inequality, 284 
Hamming distance, 266 
Hardy-—Littlewood circle method, 12, 45, 150, 

463 
Hardy—Littlewood maximal function, 171 
inequality, 170, 171 

Hardy-—Littlewood prime tuples conjecture, 466 
Hausdorff—Young inequality, 156, 160, 179 
Hlawka—Minkowski problem, 254 
Hoeffding’s inequality, 17, 33 
homogeneous, 34 

finite set, 321 
hypergraphs, 454, 458, 460 

regularity lemma, 415, 461 

k-uniform, 454 

k-partite, 265 
hyperplane, non-trivial, 301 
Hélder’s inequality, 159 


independent set, 27, 247 
index, 117 
indicator function, xv 
infinitely divisible, 237 
integer lattice, xv 
inverse Halász inequality, 296 
inverse Littlhewood—Offord theorem, 292, 296 
inverse sum set estimates, 276 
problem, 51 
irreducible monic polynomial, 348 
isomorphisms, 233 
isoperimetric inequality, 130 
isosceles triangle, 319 
base, 319 
narrow, 319 
iterated convolutions, 160 
iterated sum sets, xii, 54, 65, 70 


Janson’s inequality, 27, 32, 33 

Jensen’s inequality, 9, 32, 160 

John’s lemma, discrete, 131 

John’s theorem, 123, 142, 143 
discrete, 141 


Katz—Tao lemma, 101 
Kemnitz’s conjecture, 354, 355 


Khintchine’s inequality, 178 

Khintchine’s recurrence theorem, 453 

Kneser’s theorem, 200, 207, 209, 210, 473, 485 

Koopman—von Neumann theorems, 456, 454, 
467 

decomposition, 400, 445 

Kronecker’s approximation theorem, 133, 167, 
227, 389 

Kronecker’s factor, 451 


Lagrange’s theorem, 51 
interpolation formula, 339, 341 
large deviation inequalities, 6, 27 
large sieve inequality, 384 
lattice, 114 
fundamental theorem, 115 
quotient of two, 116 
law of large numbers, 159 
Legendre’s theorem, 12 
linear bias, 149, 375, 378, 418 
of a function, 374 
linear forms condition, 464 
linear phase function, 154, 424 
linearity of expectation, 3, 8 
linearly uniform, 161 
Lipschitz constant, 33 
concentration inequality, 33 
Littlewood—Offord problem, 276 
inverse, 276 
Lovasz local lemma, 24 
lower density, 21, 209 
lower tail, 6 
estimate, 2, 5, 27 
probability, 27 
LYM inequality, 277 


magnification ratio, 267 
magnitude, xv 
Mahler basis, 140 
theorem, 140 
Mann’s theorem, 203 
Mann-Kneser—Macbeath inequality, 210 
Markov’s inequality, 2, 5, 36, 177 
martingale difference sequence, 17, 34 
Marton conjecture, 228, 232 
matching, 407 
induced, 407 
mean, of a complex valued function, 151 
measure-preserving system, 448 
compact, 448 
strongly mixing, 449 


510 


Menger’s theorem, 9, 50, 272, 274, 279 
Mian—Chowla sequence, 254 
Minkowski inequality, 179 

bound for sphere packing, 254 

first theorem, 134 

second theorem, 135, 139, 142, 168, 474 
mode, 152 
moment generating function, 10 
monotone decreasing, 19 
monotone increasing, 27 

variables, 19 
monotonicity formula, 420 
Mordell sums, 188 
multiset, 38 
multiplicative energy, 96 
multiplicative group, 92 
multiplicative set, 92 


near-exact inverse sum set theorem, 202 
near-uniform, 406 

nearest neighbor, 326 

neighboring pair, 326 

neighbors, 247 

nilsystems, 451 

nodes, 247 

non-degenerate symmetric bilinear forms, 155 
Nullstellensatz, combinatorial, xiv, 329 
Nullstellensatz, Hilbert, 332 

number theory, 109 


octahedron, 126 
Olson’s problem, 485 
order of the group element, 58 
orthogonal complement, 153 
orthogonality, 149 

properties, 152 


Pliinnecke inequalities, 73 
parallelepiped, d-, 215 
center, 215 
corners, 215 
fundamental, 132 
Parseval identity, 152 
partial difference set, 79 
sum set, 79, 261 
partition, good, 482 
paths, 262, 263 
period, 55 
petals, 38 
phase functions, 164 
locally polynomial, 425 


Index 


phase polynomial, 425, 431 
linear phase polynomial, quadratic, 425 
pigeonhole principle, 42, 254 
dyadic, 86, 266 
planar graph, 310 
Plancherel’s theorem, 153, 165, 171, 
192 
Pliinnecke inequalities, xiv, 53, 65, 74, 90, 228, 
246, 269, 275 
normalized, 269 
theorem, 267, 269 
Pliinnecke—Ruzsa estimates, 269 
Poincaré recurrence theorem, 415, 452 
points, 247 
Poisson summation formula, 155, 383 
Polya—Vinogradov inequality, 165 
polynomial bias, 425 
method, 329, 331 
phase function, 441, 447 
totally positive, 34 
Pontryagin dual, 150, 237, 238, 266 
poor set, 408 
popularity principle, 5 
for bipartite graphs, 247 
simultaneous, 266 
power set, xv 
prime number theorem, 45, 49 
primitive, 362 
elements, 347 
probabilistic method, xiv, 1, 246 
probability, of a group, 151 
progression, 114 
proper progression, xvii, 121, 122, 143, 223, 
230 
arithmetic progression, 194, 257, 370, 
372, 452, 465, 470, 471, 472, 473, 479, 
484 
generalized arithmetic progressions, 470 
Prékopa—Leindler inequality, 128, 129 
pseudo-random, 160, 161 
sets, 149, 417 
Pythagoras’ theorem, 404 


quadratic phase functions, 416, 429, 
432 
quadruple, good, 326 
quasi-periodic, 429 
functions, 445 
quotient group, 113 
set, 99 
space, 115 


Index 511 


radius, 166 
Ramsey theory, 246, 254 
for many colors, 256 
for two colors, 255 
random Bernoulli matrix, 297 
random graph, 33 
matrices, xiv 
walks, xiv 
rank, xix, 114 
of the Bohr set, 166 
full, 114 
of a subset, 211 
reflection principle, 17 
regular, 34 
Bohr set, 170 
relative Szemerédi theorem, 466 
compact extension, 451 
rich lines, 312, 324 
pairs, 312 
Riemann hypothesis, 45, 47 
Riemann manifolds, 221 
Riemann zeta function, 47, 417 
Riesz—Thorin complex interpolation theorem, 
157 
interpolation, 180 
rigidity parameter, 478 
Rogers—Shepard inequality, 130 
root of unity, 362 
primitive, 362 
Roth’s theorem, 259, 354, 370, 372, 373, 375, 
376, 377, 389, 398, 407, 409, 415, 417, 
441, 454, 464 
for integers, 386 
in random subsets of torsion groups, 
381 
for p-torsion groups, 378 
Roth—Bourgain theorem, 392 
Rudin’s inequality, 176, 178, 193 
Ruzsa covering lemma, 53, 69, 73, 78, 90, 93, 
105, 171, 274, 435 
continuous version, 125 
Ruzsa distance, 60, 63, 64, 66, 67, 76, 102, 106, 
155 
left-invariant, 93 
right-invariant, 93 
Ruzsa metric, 68 
Ruzsa triangle inequality, 53, 60, 67, 78, 83, 
102, 275 
Ruzsa—Chang theorem, 229, 239, 475, 
476 
in arbitrary groups, 244 


subgraph, induced, 247 
Santalo’s inequality, 245 
Schnirelmann density, 209 
Schur’s theorem, 254, 256, 259 
Schwartz-Zippel lemma, 331 
second moment method, 5, 6 
Selberg sieve, 466 
semi-random method, 251 
set, independent, 247 
set systems, 277, 454 
shift operator, 448 
shift, Bernoulli, 449 
circle, 448 
skew, 448 
Sidon set, 57, 58, 82, 172, 176, 226, 
252 
maximal, 253 
sieve theory, 47, 463, 466 
simplex removal lemma, 454, 455 
simplified, 34 
skew shift, 451 
small set, 481 
Smith normal form, 116, 118 
smoothing, 165 
Snevily’s conjecture, 342, 343, 345 
Sobolev norms, 34 
soft inverse theorem, 442, 445, 467 
span, 114, 247 
spectrum, 241, 382 
Spec, (A), 149 
Sperner systems, 277 
Sperner lemma, 278, 279 
Sperner product, 279 
sphere packing, 254 
splitting lemma, 116 
step, 120 
Stepanov’s method, 329, 357 
Stirling’s formula, 48 
strong mixing, 453 
strongly intersects, 323 
sub-lattice, 114 
subcomplete sequence, 480 
subset sums, xii 
problem, 276 
successive minima, 135 
sum set, xii, 470 
complete, 79 
estimates, 53, 112, 276 
partial, 79 
sum-free sets, 4 
subsets, 248 


512 


sum-product estimates, 99, 105, 109, 158, 329 
problem, 315, 325, 327 
sunflower, 38 
lemma, 38 
superfactorial, 335 
support, of a random variable, 153 
Sylvester—Gallai theorem, 314 
symmetric set, 57, 68, 84 
symmetric convex body, 123 
symmetric progression, xvii 
symmetric sum set estimates, 73 
symmetry group, 55, 57, 200 
syndetic, 483 
Szemerédi regularity lemma, xiv, 246, 369, 398, 
406, 414, 415, 454, 455, 464 
Szemerédi’s theorem, xiv, 189, 260, 261, 369, 
370, 371, 376, 398, 414, 416, 417, 424, 
440, 445, 449, 451, 454, 455, 462, 463, 
465, 467, 470 
in an arbitrary group, 370 
infinitary ergodic approach, 448 
multi-dimensional, 452 
polynomial, 452 
relative, 465 
Szemerédi—Trotter theorem, 308, 311, 325 
generalized, 313 


tensor power trick, 67, 72, 248 
thin, 13 
bases, 34, 37, 40, 42, 44 
Tomas-Stein argument, 384 
inequality, 180 
torsion group, 113 
subgroup, 119 
torsion-free, 113, 116, 121, 209, 224 
additive group, 223, 235, 253 
groups, 236 
universal representation, 235, 236 
tower exponential, 407, 416, 460 
transference principle, 381 
tree argument, 476 
triangle, 247 
inequality, 9, 159, 178 
removal lemma, 408, 454, 455 
triangle-free graph, 251 
tripling constant, 67 
trivial sum set estimates, 54 
Turán’s theorem, 246, 248 
twin prime conjecture, Goldbach’s conjecture, 
466 
Tychonoff’s theorem, 26 


Index 


uncertainty principle, 160, 365 
uniform almost periodicity norms, 
444 
uniform Lipschitz control, 34 
uniformly almost periodic, 447 
functions, 445 
of order d — 1, 445 
union bound, 3 
union, disjoint, 222 
universal ambient groups, 233, 234 
universal characteristic factor, 450 
tail, 6 
bound, 33 
density, 21 
estimate, 2, 36 


Van der Corput lemma, 421, 427, 432, 
430 
van der Waerden’s theorem, 246, 254, 258, 371, 
414, 416, 441, 446 
Vandermonde determinant, 335 
permanent, 343 
variance, 2, 6 
Varnavides’ theorem, 373, 375, 377, 
417 
for p-torsion groups, 380 
vertex set, 454 
vertices, 247 
adjacent, 247 
Vinogradov’s theorem, 12, 51 
Vitali covering lemma, 170, 171 
volume-packing lemma, 132 
von Mangoldt function, 49, 463 
von Neumann ergodic theorem, 450, 451, 
453 
von Neumann theorem, generalized, 
421 
Vosper’s theorem, 205, 210 


Waring bases, 43 
Waring conjecture, 12 
weakly mixing, 453 
Weyl criterion, 165 
exponential sum estimate, 423 
Wiener algebra norm, 444 
Wiener space, 13 
Wilson’s theorem, 350 


Young inequality, 157 


zero frequency, 153 


